M3 core counts and performance

Z5 reviews are out today and it's a bit hilarious seeing people losing their minds over them (in the AT forum thread, for example).

I'd say I've never seen so many bad hot takes, and so much heat and noise with so little light, but... internet. Still, it's on the high end.

As far as I can see, it's early days and they may have a lot of runway on which to iterate over time (Z5+ maybe, or 9700XT, or whatever). Or just new AGESA. But as it stands, despite whining gamers, it seems pretty solid.

It's still not competitive with likely M4 chips- though of course you can buy it now.
 
I bolded the operative phrase. I don't think it was ever anything much more than fan wishcrafting.

Z5 reviews are out today and it's a bit hilarious seeing people losing their minds over them (in the AT forum thread, for example).

I'd say I've never seen so many bad hot takes, and so much heat and noise with so little light, but... internet. Still, it's on the high end.

As far as I can see, it's early days and they may have a lot of runway on which to iterate over time (Z5+ maybe, or 9700XT, or whatever). Or just new AGESA. But as it stands, despite whining gamers, it seems pretty solid.

It's still not competitive with likely M4 chips- though of course you can buy it now.
The developer of y-cruncher still says he sees 30% gains in integer IPC and talks as though the 40% IPC SpecInt rumor is still reasonable. That obviously doesn’t track with what we see in the Anandtech review. He does caveat that there were no memory bottlenecks in his test code, but even so his code compilation was 20% which I suppose is not too far off SpecInt’s gcc ISO-clock of 15% but SPECInt as a whole is more like 11% when comparing the 9700X to the 7700X. Then again, the difference is a lot bigger than the difference between the HX370 and the 7845 I think it was - which Anandtech themselves note in the review. So maybe Zen 5 continues to improve over Zen 4 as clocks increase? Maybe? Thus, the 9900x will be that much better (still not 40% obviously) than the 7900x? We'll just have to wait for that review. The developer of y-cruncher clearly has access to these chips as he is holding those sections until the release of the full desktop chips, so he has data that we don't have.

I do find the gnashing of teeth a little odd (haven't seen the forum, but just some of the reviews like Gamer's Nexus). Like there are architectural performance improvements in Zen 5 and it's more efficient. Reviewers kept mentioning how power hungry AMD and especially Intel chips were getting and how that was bad on multiple levels, with both chip makers but especially Intel suffering problems. Now that AMD is focusing on efficiency that is also bad ... I mean get wanting the chip maker to do both at once is best, but chip designers are only human mortals and maybe its other worldly expectations rather than reality that should be checked? The incredible progression from Bulldozer through Zen 3 had to end at some point, people understand that right? The stability problems despite mature platforms is more of an issue, though presumably those will be worked out over time. I hate to tell people to settle for mediocrity but ... for those who need/want to upgrade now, especially in the 9700x product segment which is not the enthusiast builds, is a quieter, cooler machine a bad thing?

Anyway, the y-cruncher post is actually about how Zen 5's AVX512 implementation works compared Intel/Zen 4 which is super interesting and deserves its own discussion, but I saw this integer result as a point of interest to bring up here. EDIT: I posted about the AVX512 stuff in the SME thread here.
 
Last edited:
I do find the gnashing of teeth a little odd (haven't seen the forum, but just some of the reviews like Gamer's Nexus). Like there are architectural performance improvements in Zen 5 and it's more efficient. Reviewers kept mentioning how power hungry AMD and especially Intel chips were getting and how that was bad on multiple levels, with both chip makers but especially Intel suffering problems. Now that AMD is focusing on efficiency that is also bad ... I mean get wanting the chip maker to do both at once is best, but chip designers are only human mortals and maybe its other worldly expectations rather than reality that should be checked? The incredible progression from Bulldozer through Zen 3 had to end at some point, people understand that right? The stability problems despite mature platforms is more of an issue, though presumably those will be worked out over time. I hate to tell people to settle for mediocrity but ... for those who need/want to upgrade now, especially in the 9700x product segment which is not the enthusiast builds, is a quieter, cooler machine a bad thing?
There are a lot of people out there who echo-chambered overly optimistic rumors about Zen 5 to the point that a letdown and backlash was inevitable, even if AMD doesn't really deserve it.
 
There are a lot of people out there who echo-chambered overly optimistic rumors about Zen 5 to the point that a letdown and backlash was inevitable, even if AMD doesn't really deserve it.
What do you think about the idea that Zen 5 continues to improve over Zen 4 with higher clocks? I mean not to the point of the more outlandish rumors, but it is interesting that the new 9700 did better against the 9700 than the new mobile HX part did against its older counterpart. Further the y-cruncher guy was getting even better results in his compilation test than the SPECint compilation - different thing being compiled with different compiler probably explains more of it I’ll grant you but he also probably has the 9900x I’m guessing.
 
What do you think about the idea that Zen 5 continues to improve over Zen 4 with higher clocks? I mean not to the point of the more outlandish rumors, but it is interesting that the new 9700 did better against the 9700 than the new mobile HX part did against its older counterpart. Further the y-cruncher guy was getting even better results in his compilation test than the SPECint compilation - different thing being compiled with different compiler probably explains more of it I’ll grant you but he also probably has the 9900x I’m guessing.
I'm a little dubious about the general idea, but there is a link between CPU speed and memory speed which has significant effects. There are going to be sweet spots for clocks vs. memory timings. (This is well documented but I don't remember the details so I'm handwaving.)
 
What do you think about the idea that Zen 5 continues to improve over Zen 4 with higher clocks? I mean not to the point of the more outlandish rumors, but it is interesting that the new 9700 did better against the 9700 than the new mobile HX part did against its older counterpart. Further the y-cruncher guy was getting even better results in his compilation test than the SPECint compilation - different thing being compiled with different compiler probably explains more of it I’ll grant you but he also probably has the 9900x I’m guessing.
Haven't followed it closely enough to be completely sure what the theory is about.
 
What do you think about the idea that Zen 5 continues to improve over Zen 4 with higher clocks? I mean not to the point of the more outlandish rumors, but it is interesting that the new 9700 did better against the [7]700 than the new mobile HX part did against its older counterpart. Further the y-cruncher guy was getting even better results in his compilation test than the SPECint compilation - different thing being compiled with different compiler probably explains more of it I’ll grant you but he also probably has the 9900x I’m guessing.
And the answer to that question was a big fat nope!


And breaking things down to the individual scores, things once again look similar to what we saw with the 9700X last week.

Can say that again, 9950X has the same clock speed as the 7950X and the same ~11% SPECInt Iso-clock improvement as the 9700X over the 7700X.

So unclear why the mobile part didn't seem to get any per-clock ST SpecINT improvement, but these desktop parts do (though smaller than obviously anticipated by the "leaks").

EDIT: http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardown/ claims the 5.7GHZ in the 7950X wasn't an achievable clock speed and whereas it is in the 9950X but I think that's for certain AVX workloads and I am not sure if that impacts the Integer results seen above.
 
Last edited:
So unclear why the mobile part didn't seem to get any per-clock ST SpecINT improvement, but these desktop parts do (though smaller than obviously anticipated by the "leaks").

Might have to clarify exactly what comparisons we're looking at. With the 9950X and 7950X, you've got a pretty "standard" comparison for generation vs generation. 5nm to 4nm, same target performance profile, etc.

With the HX 970 vs 7940HS, we have a newer chip at about 20% lower TDP, with a lower boost clock. The 7940HS also happens to be Zen 4 on 4nm, versus the 5nm of the desktop Zen 4 chips. There's just simply more confounding variables it seems. Makes me wonder if some of issue is simply that the boost on Int comes more from changes going to the newer process, which got rolled into Zen 5? i.e. Zen 4 mobile is more of a half-step between Zen 4 desktop and Zen 5 desktop?
 
Might have to clarify exactly what comparisons we're looking at. With the 9950X and 7950X, you've got a pretty "standard" comparison for generation vs generation. 5nm to 4nm, same target performance profile, etc.

With the HX 970 vs 7940HS, we have a newer chip at about 20% lower TDP, with a lower boost clock. The 7940HS also happens to be Zen 4 on 4nm, versus the 5nm of the desktop Zen 4 chips. There's just simply more confounding variables it seems. Makes me wonder if some of issue is simply that the boost on Int comes more from changes going to the newer process, which got rolled into Zen 5? i.e. Zen 4 mobile is more of a half-step between Zen 4 desktop and Zen 5 desktop?
Interesting idea ... but it doesn't appear to be the case as far as I can see: the 7940HS' and the HX370's ST SPECInt score are both pretty low like 7.0 vs 9.88/10.95 for the desktop parts. Which is vastly bigger gap than the clock speed - it's a 41% increase in SPECInt going from the 7940HS to the 7950X with less than a 14% increase in clocks. If anything the desktop core looks like the more advanced one! And by a lot.

This is so weird.

Edit: Ahhhhh ... one issue, possibly a major culprit here, despite being published about 10 days apart Anandtech dramatically changed which version of clang they were using to compile SPEC:

From their mobile review:

clang version 10.0.0<br>clang version 7.0.1 (ssh://git@github.com/flang-compiler/flang-driver.git<br>&nbsp;24bd54da5c41af04838bbe7b68f830840d47fc03)

-Ofast -fomit-frame-pointer<br>-march=x86-64<br>-mtune=core-avx2<br>-mfma -mavx -mavx2 Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions.

From their desktop reviews 1 and 2:

clang version 18.1.8<br>gfortran version 14.2.0

-Ofast -fomit-frame-pointer<br>-march=[x86-64-3 or x86-64-4, depending on chip's supported ISA]
Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches. Because it's not possible to build a single set of binaries that offer AVX-512 support while still gracefully falling back to AVX2 on platforms that lack the feature, we're technically running two sets of binaries on x86 platforms. AVX-512 processors get binaries compbiled with the -march=x86-64-4 flag, while all other x86 platforms get -march=x86-64-3. And note that while scores are similar overall, the results from these new binaries are not comparable to our previous binaries, due to the significant compiler changes in the last few years.

Without knowing how that affects things, these results are not comparable. I'll admit I did read this in their review last week but I assumed that they had meant previous reviews from last year or two years ago ... and not the review from the prior week.
 
Last edited:
came acrross this comment on r/hardware.
"There you can see that Lunar Lake beats everything single core (but the M3 but its score is heavily inflated due to other measures, but I digress) while using a fraction of the power AMD or Snapdragon use, due to it being able to optimise the process on the core clusters and disabling what it doesn’t need."


How true it it? I don't believe it because the M3 has slower RAM and clock so the difference is only due to M3 having higher IPC. the benchmark here is CB 2024.
 
came acrross this comment on r/hardware.
"There you can see that Lunar Lake beats everything single core (but the M3 but its score is heavily inflated due to other measures, but I digress) while using a fraction of the power AMD or Snapdragon use, due to it being able to optimise the process on the core clusters and disabling what it doesn’t need."


How true it it? I don't believe it because the M3 has slower RAM and clock so the difference is only due to M3 having higher IPC. the benchmark here is CB 2024.
I’m working on a new CB 2024 chart from notebookcheck ;) - and no Lunar Lake is not that impressive in ST. It’s a huge improvement over Meteor Lake and better than AMD’s Zen 5, but Apple’s M2 and M3 are better, so is Oryon.
 
A revisualization of Notebookcheck's Cinebench R24 performance and efficiency data.

Screenshot 2024-09-25 at 6.34.34 AM.png


Details: This is only results from one benchmark, Cinebench, which has gone from being one Apple's worst performing benchmarks in R23 to one of its best in R24. As I am interested in getting as close as possible to the efficiency of the chip itself, power measurements above subtract idle power which NotebookCheck does not do when calculating the efficiency of the device. With the release of Lunar Lake, an N3B chip, I've added M3, Apple's corollary to Lunar Lake, and estimated M3 Pro's efficiency based on its power usage in R23 and the base M3 CPU's power/performance in R23/R24 (NotebookCheck did not have power data for R24 for the M3 Pro). I feel it is maybe overestimating M3 Pro's efficiency a little, but not by enough to matter given the gulf between it and every other chip. The M3 was in the Air, given that Cinebench is an endurance benchmark, its score and power usage will both likely be higher in the 14" MacBook Pro. I did also have the Snapdragon X1E-84-100 but removed it since it was clutter and didn't add much. The Ultra 7 258 is one of the upper level Lunar Lake chips, but not the top bin - the 288 might improve efficiency/performance somewhat by having better silicon, but the effect will be small relative to the patterns we see.

So what do we see? First off, Lunar Lake has great single core performance and efficiency ... for an x86 chip - helped perhaps by being on a slightly better node, N3B, than the M2 Pro (N5P), HX 370 (N4), and Snapdragon (N4). Despite this advantage, the Snapdragons on N4 and M2 Pro on N5P are still superior in ST performance and efficiency. The Intel 7 288V might increase performance to match or slightly beat the Elite 84 (not pictured), but it would be at the cost of even more power. That said, the efficiency and performance improvements here are enough to make x86 potentially competitive with this first generation of Snapdragons - at least enough that with compatibility issues, Intel can claim wins over Qualcomm and begin to lessen its appeal.

However, this has come at a cost of MT performance. The prevailing narrative is that without SMT2/HT, Intel struggles to compete against AMD and Qualcomm. And to certain extent that's true, but with only 4 P-cores and 4 E-cores in a design optimized for low power settings, it was never going to compete anyway. The review mentions it gets great battery and decent performance on "everyday" tasks in stark contrast to full tilt performance represented by Cinebench R24 and that after all this for thin and lights. The closest non-Apple Lunar Lake analog is the 8c/8T Snapdragon Plus 42 whose ST performance is a little lower than the 258V, but with much, much greater efficiency and whose MT performance and efficiency is much better than the Lunar Lake chip. However, the Snapdragon Plus 42 has a significantly cut down GPU which was already the weakest part of the processor. I'm not saying it can't provide compelling product, especially if priced well, but given the compatibility issues it's tougher sell for Qualcomm that it would've been last year. As for AMD, there is no current analog to the Lunar Lake in AMD's lineup. Sure, a down clocked HX 370 gets fantastic performance/efficiency at 18W ... but that's to be expected from a 12c/24T design which would frankly be cramped inside thin and lights - its not really meant for that kind of device. It's a Mx Pro level chip at heart and should be compared to the upcoming Intel Arrow Lake mobile processors. AMD's smaller Kraken Point is supposedly coming out next year with a more similar CPU but again is rumored to cut the GPU and according to the notebookcheck review, the Intel iGPU in Lunar Lake is already competitive with if not better than the AMD iGPU in the larger Strix Point. It's fascinating how AMD and Qualcomm both designed more workload-oriented CPU-heavy designs while Intel has basically designed Lunar Lake to be like the base M3, more well rounded.

But that brings us to the M3 and the comparisons here are pretty ugly for all of its competitors. Again, Apple tends to do very well in CB R24, so we shouldn't extrapolate from this one benchmark that it will be quite this superior to AMD, Intel, and Qualcomm in every benchmark. With that caveat aside ... damn. The ST performance and efficiency are out of this world and simply blow the other N3B chip, the Lunar Lake 258V, away with both a large performance gap and an even larger efficiency gap, nearly 3x. Even the M2 Pro and Snapdragons are just not that close to it. Sure in MT a down clocked Strix Point can match the base M3's performance profile at 18W, but that is a massive CPU by comparison and the comparable Apple chip to the HX 370, the M3 Pro, is leagues better than anything else in this chart, including the HX 370. I have to admit: while Apple adopted the 6 E-core design for the base M4, if the M4 Pro doesn't have its own bespoke SOC design and is a chop of the Max, then, depending on how Apple structures the upcoming M4 Max/Pro SOC, it'll be a shame to see Apple lose a product at this performance/efficiency point. The M3 Pro is rather unique. Also, its 6+6 design really highlights how improved the E-cores (and P-cores) were moving from the M2 to the M3, especially in this workload.

Meanwhile the two chips of comparable CPU design to the base M3, the Plus 42 and the 258V, are simply not a match for the base M3 in MT requiring double or more power to match its performance or otherwise offering significantly reduced performance at the same power level. Intel claimed to match/beat the M3 in a variety of MT tasks in its marketing material, but aside from specially compiled SPEC benchmarks, you can see how much power it takes for it to actually do that. Basically Apple can offer a high level of performance (for the form factor) in a fan-less design and its competitors, including the N3B Lunar Lake, simply cannot. Also like Lunar Lake, Apple also went for a balanced design here opting for powerful-for-its-class iGPUs to be paired with its CPUs (though obviously some of these chips, especially the Strix Point can also be paired with mobile dGPUs). There is a point to be made about the base MacBook Pro 14"'s price which is quite expensive, has a fan, and is still the base M3 with a low base memory/storage option - but even so, as we can see above, the base Apple chip is not without its merits at that price point/form factor. To reference @casperes1996, now that both Intel and Apple are on N3B ... I guess we figured out who orders pizza the best. ;)

Oh and ... this is the performance/efficiency gulf of the newest generation of AMD, Intel, and Qualcomm processors with the M3 ... with the M4 Macs about to come out. 😬

References:

 
Last edited:
Published analysis of die shots comparing Qualcomm Elite, Strix Point, and Apple M4:


If anyone has annotated Apple M2 Pro die shots that they'd like to make a comparison with, that'd be great!
 
Last edited:
Huh … so that’s why reviewers were sent the 258V instead of the 288V …


Similar situation in terms of availability with Qualcomm’s top chips (the 84 and 00 are only available in one model each so far) though I am unaware of a BIOS bug with Qualcomm 🙃.
Whoops.
 
“We also noticed a new type of behavior with the Ultra 200V series — the Skymont E-cores run at higher frequencies than the Lion Cove P-cores during heavily threaded work, an expected but new behavior that indicates an exceptional amount of performance and efficiency from the new E-cores.“

What? how do they reach that conclusion from that fact?
 
“We also noticed a new type of behavior with the Ultra 200V series — the Skymont E-cores run at higher frequencies than the Lion Cove P-cores during heavily threaded work, an expected but new behavior that indicates an exceptional amount of performance and efficiency from the new E-cores.“

What? how do they reach that conclusion from that fact?
Because otherwise why would Intel do that? ;)

To be fair, under the the tests run the new E cores are pretty good, but yes it’s an odd choice.
 
Back
Top