M3 core counts and performance

Sarah Kerrigan (pretty sure that's not her real name), a really good contributor to the forum. It's not exactly clear why but she seems to have deactivated her account on the forum. This came after a lot of arguing about Apple's chips vs AMD's, among other things.
That’s a pity.
 
Are we sure she didn’t leave because of the ads and constant reloads? Don’t get me wrong, I’m sure it’s the forum culture over there, but damn those forums are impossible to navigate on my iPhone. Constantly blocking text, jumping around, crashing. @leman you’re on those Anandtech forums too right? Are you hitman? I can’t remember.
 
I am curious about the FP improvements in Zen5, these seem to be considerable. Do we know more about that? Can't be AVX-512 since Anandtech compiled SPEC without it. I remember there was mention of reduced FADD latency, did that have such a large impact?
 
@leman you’re on those Anandtech forums too right? Are you hitman? I can’t remember.

I am not a regular on anandtech forums, and I am afraid I don't remember SK...

There are some nice and passionate people there, but most of the time the discussions are fairly biased. Not as bad as MR though :)
 
Last edited:
Fantastic writeup. I was far too lazy to do that myself but I think you hit just about every point I wanted to make. It is very frustrating that we can't count on Anandtech to do this sort of thing any more. :-(
Thanks man, that really means a lot. I'm sitting here icing down my hernia surgery and popping pain meds so this is a good distraction for me. TBF, I think the Anandtech article is pretty good, I just wish they had done power measurements on the M3 MacBook Pro using their same process. They did it on the Intel and older Ryzen and it's CB R24 so I'm not sure what the issue was there. They said in the comments that they are still waiting on being sampled Qualcomm processors so that's why they haven't got a review for that, they do expect to get one, it's just really delayed. It's sad that Anandtech is no longer considered a priority for people to get machines to. Anyway I really like NotebookCheck's power measurements - though I wish they'd do both wall and software as sanity check for both, but of course that doubles review time so I understand why they don't. Given the two articles, especially NotebookCheck's, I'm sticking with the Strix Point's 28W TDP setting being close to the M3 Pro's rather than base M3's power draw though, seems the most likely.

I especially want to call attention to your analysis of AMD vs. QC. This sounds VERY scary for QC, and while pricing is critical (as I've pointed out when comparing M3/M4 to SXE), I am dubious about whether or not QC can make up the differences there.

In particular, the iGPU issue seems critical. QC is taking it on the chin for both performance and compatibility, and even if they fix the latter, they're getting their clocks cleaned on the former.

It now seems to me even clearer that they've made a grievous error not pushing low power and fanless designs. They are not going to be able to compete with AMD at the high end, I think, but they might be able to do a fanless design. It would absolutely suck compared to an M3 (much less future M4), but it could fill a significant niche that AMD and Intel currently can't (or at least aren't).

The big question I haven't seen answered yet is, how big is the silicon? And how big are the 5 and 5c cores? As you say, this feeds into cost, which is likely to matter this time around.
A lot is riding on v2 for Qualcomm. I think the big problem is that they were very obviously late and the problems with software and drivers might've been even worse earlier. However, had they released last year, even the current state, they would've been much, much better off (though not necessarily if software/drivers were even worse). But yeah I totally agree, they need E-cores and at least two SOCs to cover their use cases and a true fan-less design. And also agree about silicon size being super important for QC right now, someone said Geekerwan might be doing a silicon size comparison soon, I don't know what's happening with that though.

I am curious about the FP improvements in Zen5, these seem to be considerable. Do we know more about that? Can't be AVX-512 since Anandtech compiled SPEC without it. I remember there was mention of reduced FADD latency, did that have such a large impact?

Comparing M3 and Zen 5 to M1 and Zen 2/3 in Spec2017 is really interesting:
Screenshot 2024-07-31 at 10.03.55 AM.png

Screenshot 2024-07-31 at 10.02.34 AM.png

Fascinating that the Int score for the HX 370 P is worse than the 5950X while the FP score is massively improved! It can't all be clocks because the 5950X clocked lower than the HX 370P - at least theoretically I see people in the forums @NotEntirelyConfused linked to saying that the unit Anandtech was delivered doesn't hit its purported max clocks but the one NotebookCheck did, better cooling in the latter or something? Regardless, at best the ISO-clock performance in SPECInt 2017 is probably going to be similar between Zen 3 and Zen 5, no improvement, but an absolutely massive FP performance improvement. Meanwhile the Apple improvements have been more on the Integer side though the FP has still gone up quite nicely ... though that seems to be largely clock speed? - again depending on how often Apple actually hits its max clocks in the M3. We had some discussions about that as well I remember. If it does it 4.05/3.2*10.37 = 13.12 is the expected FP score from clocks alone. Meanwhile 8.4 would've been the expected Int score.

Notably Geekerwan seems to be saying that the Apple M3 did not hit max clocks in SPEC FP workloads, unless I'm misunderstanding something:

Screenshot 2024-07-31 at 10.27.14 AM.png


I'm assuming 3.86GHz is the clocks of the chip while the running the test, but the M3 is very much the oddity here with the only substantial difference between FP and Int clocks - maybe a mistake? A weird one to make though.

Actually scratch that when they do their simulated downclock look at the frequency of the M4 in FP ...


Screenshot 2024-07-31 at 10.38.51 AM.png


Anyway ... lots to chew on ...

Here is Geerkerwan's chart for ISO-clock SPEC improvement for the Apple Silicon chips over the generations:

Screenshot 2024-07-31 at 10.44.06 AM.png
 
Regardless, at best the ISO-clock performance in SPECInt 2017 is probably going to be similar between Zen 3 and Zen 5, no improvement
That runs contrary to all the reports about massive IPC uplift in Z5. And yes, much of that may come from special-purpose instructions, as did a lot of the GB6 uplift in M4 due to SME, but I was under the strong impression that there was still a pretty decent general improvement. Think that's wrong? I also recall that Z4 had a decent improvement in IPC compared to Z3, which is confirmed by a very quick web search, though the specifics escape me.
 
That runs contrary to all the reports about massive IPC uplift in Z5. And yes, much of that may come from special-purpose instructions, as did a lot of the GB6 uplift in M4 due to SME, but I was under the strong impression that there was still a pretty decent general improvement. Think that's wrong? I also recall that Z4 had a decent improvement in IPC compared to Z3, which is confirmed by a very quick web search, though the specifics escape me.
The SPECFP 2017 uplift of Zen 5 over Zen 3 is amazing though and shouldn’t be affected by AVX512 based on how Anandtech compiles it (AVX2 only). I’d have to go through the subtests and compare. So yeah I dunno if I’m misunderstanding something but it looks like no SPECInt 2017 improvement at all which I agree makes no sense! Again maybe if I go through the subtests it’ll make sense. I’m really not sure what’s going on here. I feel like I must be missing something obvious.

EDIT: Anandtech noted little to no improvement in Int from Zen 4 to Zen 5, but comparing charts with their older ones would appear to indicate no real SPECInt improvement from Zen 3 to Zen 5.

In its highest performance configuration, AMD is touting a 16% average IPC uplift for the Zen 5 architecture. But this is for the full-fat configuration with 512-bit wide SIMDs for single-cycle AVX-512 support. What Strix Point offers is a bit less, with just a 256-bit wide SIMD requiring execution over multiple cycles. The architecture still benefits from the AVX-512 instructions, but it doesn't gain the data throughput benefits. So mobile Zen 5 already starts off with a smaller potential performance uplift. Coupled with that, Ryzen AI 9 HX 370 has a peak clockspeed of 5.1GHz, versus 5.2GHz for the Ryzen 9 7940HS, so there is a slight regression here in terms of clockspeeds for the mobile parts.

All of which is to say, that these trade-offs erode some of the single-threaded performance gains the Zen 5 architecture otherwise offers, and SPEC CPU 2017's integer benchmarks seem rather unfazed. Here the HX 370 only barely edges out the 7940HS by the very slightest amount – 0.01 points – and this is not a workload where the chips' TDP differences should matter. So our first test does not find significant gains for AMD's new architecture. Still, we're treating mobile as more of a preview of things than the final word, as the desktop release should be far more enlightening thanks to the ability to better ensure platform parity, as well as throwing TDP concerns out the window altogether.

Again it's possible that the Zen 5 part wasn't hitting its max clock speed, but even so ... very small to no ISO-clock integer improvement. Absolutely massive FP improvement though. Should try to compare results with GB Integer/FP.

=======

Btw @Eric did you add the option to resize picture attachments? I don’t remember that before but now that I can fiddle with the sizes it looks much nicer on both mobile and desktop. Maybe I was just blind before!
 
Last edited:
The SPECFP 2017 uplift of Zen 5 over Zen 3 is amazing though and shouldn’t be affected by AVX512 based on how Anandtech compiles it (AVX2 only). I’d have to go through the subtests and compare. So yeah I dunno if I’m misunderstanding something but it looks like no SPECInt 2017 improvement at all which I agree makes no sense! Again maybe if I go through the subtests it’ll make sense. I’m really not sure what’s going on here. I feel like I must be missing something obvious.

EDIT: Anandtech noted little to no improvement in Int from Zen 4 to Zen 5, but comparing charts with their older ones would appear to indicate no real SPECInt improvement from Zen 3 to Zen 5.

Again it's possible that the Zen 5 part wasn't hitting its max clock speed, but even so ... very small to no ISO-clock integer improvement. Absolutely massive FP improvement though. Should try to compare results with GB Integer/FP.
Yeah, I saw that in AT too, but it was early days and I ... forgot. Doh.

So, seriously, WTF?? There are all these under-the-hood improvements - bigger ROB, more execution units, better branch prediction, dual fetch/decode, etc. How does that add up to... nothing? Where is the 30% integer IPC benefit people were fantasizing about before release, or (much more to the point) the ~16% I've seen people talking about since then?

I agree the FP improvements are great, and I'm not minimizing those. But there is something seriously fishy here.
 
So, seriously, WTF?? There are all these under-the-hood improvements - bigger ROB, more execution units, better branch prediction, dual fetch/decode, etc. How does that add up to... nothing? Where is the 30% integer IPC benefit people were fantasizing about before release, or (much more to the point) the ~16% I've seen people talking about since then?
I bolded the operative phrase. I don't think it was ever anything much more than fan wishcrafting.
 
Zen5 runs slightly lower clock to improve power consumption. my impression is that the IPC improvements went there. They can raise the clocks again once they adopt a smaller node.
 
Yeah, I saw that in AT too, but it was early days and I ... forgot. Doh.

So, seriously, WTF?? There are all these under-the-hood improvements - bigger ROB, more execution units, better branch prediction, dual fetch/decode, etc. How does that add up to... nothing? Where is the 30% integer IPC benefit people were fantasizing about before release, or (much more to the point) the ~16% I've seen people talking about since then?

I agree the FP improvements are great, and I'm not minimizing those. But there is something seriously fishy here.

I bolded the operative phrase. I don't think it was ever anything much more than fan wishcrafting.

In fairness, just judging by eyeball mind you, I think GB5 might tell a slightly different story:

Zen 5:


Zen 3:


While Geekbench 6 does split by Int/FP (the combined score is 65% Int and 35% FP) they don't officially report the scores separately anymore (I'm not 100% sure which subtests are which) so I just searched Geekbench 5. While the ISO clock increase for integer isn't fantastic, I am seeing single digit improvement (again by eye). To get clocks you have to add .gb5 to a score's page. Geekbench reports a range a frequencies that it measures during its tests ... which @leman and I assume are sampled clock frequencies during the single core tests but I gotta admit I'm not 100% sure what they actually are. In the violin ISO-clock plots we take the mean of them for the clock frequency of the device.

Example:


Average clocks: 4.93984 GHz
Int Score: 1655


Average clocks: 4.77945 GHz
Int Score: 1510

1655/1510*4.77945/4.93984= 1.06

Now part of the reason @leman did violin plots was because of the extreme variance in Geekbench runs which almost certainly carries over here, but as a single example it supports that AMD has only moderately improved Integer ISO-clock performance since Zen 3 ... though that is still better than the seemingly 0 improvement that SPECInt says it is!

Zen5 runs slightly lower clock to improve power consumption. my impression is that the IPC improvements went there. They can raise the clocks again once they adopt a smaller node.

Right relative to Zen 4 they slightly decreased clocks but I'm not sure what you mean by "the IPC improvements went there".
 
Last edited:
Right relative to Zen 4 they slightly decreased clocks but I'm not sure what you mean by "the IPC improvements went there".

What I mean is that they use the IPC improvements to achieve same or better product performance despite lowering clocks.
 
What I mean is that they use the IPC improvements to achieve same or better product performance despite lowering clocks.
Oh okay - although for Integer workloads the IPC improvements seem pretty meager since Zen 3 (though again SPECInt seems particularly harsh saying it is basically 0 while GB 5 gives some improvement). But for FP, yeah huge improvements and no doubt clocks will increase in desktop parts. BTW do you know when Geekbench reports these frequencies:

Screenshot 2024-08-01 at 12.36.34 AM.png


what are they exactly? Some processors have more entries, some have less, like do you know when these samples are taken or what they are meant to represent? Like are they taken while running the subtests? at what interval? I can't any info on this.
 
what are they exactly? Some processors have more entries, some have less, like do you know when these samples are taken or what they are meant to represent? Like are they taken while running the subtests? at what interval? I can't any info on this.

I haven't found any information on this either. It looks like these are real-time frequency estimates, but it is not clear when these are taken. Could be during the tests, could be in a warm-up phase...
 
I think I'm going to switch over to the iGPU stuff ... there is actually some interesting stuff in there!
So I'm still low on time but a brief look at the integer CPU stuff suggests real improvements: Better scores than earlier generations, at slightly lower clocks. Or did I misread that? (Entirely possible, I was in a real rush.)
 
Back
Top