M3 core counts and performance

Intel announced at Computex today that Lunar Lake's compute chiplet is being fabbed on TSMC N3B. This is interesting for several reasons, but the one that stands out for me is that this will give us Intel's latest and greatest core, versus Apple's not-quite latest (M3) core, on the same exact process. Comparisons of PPA will be much more informative!
Yep. Precisely right. It will be the end of the node argument
I’m very worried about the industry now that nobody can keep up with TSMC. Even when Intel had the best fabs, IBM and AMD were within spitting distance (and IBM generally had better transistors than Intel, and we had better interconnect than Intel, for most of that time). If intel really had faith they were going to catch up, they wouldn’t do this TSMC stuff.
The next part is on 18A. Panther Lake will be an architecture based off Lunar Lake, but scaled up.

Basically, we have arrow and lunar lake on N3B

Next part is Panther Lake on 18A which covers both.
 

Ignoring the hyperbole in the article - even in Passmark the upcoming 12-core Zen 5 HX370 does not glide past the M3 Max - particularly as one commentator said in ST (more on that later), I've felt for a long time that Passmark is a particularly archaic benchmark. However I decided to see how it actually measures up to other benchmarks comparing the 7945HX and M3 Max as well as how the benchmark describes itself and breaks down across subtests. And to spoil the ending, it is definitely archaic as benchmarks go.

(side note finding the search for Geekbench 5 scores is getting even harder - doesn't even appear in Google)

I've linked the sources used below for Passmark (AMD, Apple) and all other benchmarks (as always individual scores for benchmarks may vary quite a bit but it's the gross relationships here that are important).

AMD 7945HX (16) STApple M3 Max (12/4) STAMD 7945HX (16) MTApple M3 Max (12/4) MT
Passmark404947805471040618
Cinebench 202411414116691607
Geekbench 5213021501947222736
Geekbench 6277431251621421045

Here we see that in Passmark MT the 7945HX is beating the M3 Max by a substantial amount nearly 35%. By contrast, in all other tests they are effectively equivalent or the M3 Max is actually ahead. This really makes the result that the upcoming AMD HX370 beating or matching the M3 Max in the same power envelope rather questionable if that's being determined by Passmark alone.

So what is going on here? Let's take a look at Passmark:


There are 8 subtests: Integer, Compression, Prime Number, Encryption, Floating Point, Extended Instructions, String Sorting, and Physics. The Single Threaded tests is just a composite of three: Floating Point, String Sorting, and Compression. That's ... certainly a choice and I won't be discussing the ST test any further except to say that it is definitely not comparable to something like GB where the MT and ST subtests are the same (except maybe in scale).

AMD 7945HX (16) MTApple M3 Max (12/4) MT
Integer210,75191,352
Floating Point125,685131,772
Prime Number286575
Encryption43,30324,854
Compression714,133530,888
Extended52,56522,304
String Sorting79,25061,651
Physics2,4474,253

Lots to unpack here. Weirdest results is definitely the Integer vs Floating Point results. In the first the AMD chip absolutely annihilates the M3 Max and in the latter, the M3 Max surpasses the AMD. Here's how they are described:

The Integer Math Test aims to measure how fast the CPU can perform mathematical integer operations. An integer is a whole number with no fractional part. This is a basic operation in all computer software and provides a good indication of 'raw' CPU throughput. The test uses large sets of an equal number of random 32-bit and 64-bit integers and adds, subtracts, multiplies and divides these numbers. This tests uses integer buffers totaling about 240kb per core.

The Floating point is basically identical except replaced with floating point numbers instead of integers. So why so different? The only thing I can think of is that the Integer threads are *so* lightweight that AMD is actually getting 2x performance from SMT2. Why this doesn't happen with floating point then? Unsure, we know that Apple has great floating point performance and it's possible that AMD has fewer FP execution resources per thread limiting the effectiveness of SMT2. Any other suggestions?

In the Encryption and Extended tests, again the AMD beats Apple by around double. My guess is that (double pumped) AVX 512 (VAES for x86 in the Encryption test) is really helping the AMD chip. Not necessarily anything wrong with that, testing those workloads is totally valid and as good as Apple's NEON vectors, they are only 128b. But of course no SME/SSVE for the M3 (or the benchmark) even though the matrix accelerator is technically present (would definitely help with Extended test which tests matrix multiplication, but unsure about Encryption). Not much is described for the Extended test other than it is in units of matrices/sec so one assumes matrix multiplication of again arbitrary numbers rather than matrix multiplication actually being used to accomplish an actual goal.

The M3 Max turns it around in the Prime and Physics tests, where it leads by around double the AMD chip. Based on the description in the link, both are memory bound tests, so the M3 Max's superior memory bandwidth to the CPU cores should play a huge role here. I'm not familiar with the particulars of the Physics test and what if any extensions it uses ... or could use.

That just leaves String Sorting and Compression, both of which the AMD wins comfortably. Nothing stands out about them based on the limited description and it is possible it is simply the larger number of P-cores/threads that helps the AMD chip win here (depending on how you count things could have effectively 33% more execution resources and wins by roughly 33%).

In conclusion, Passmark is potentially overly weighted by AVX workloads (and no SME) but to be fair has a couple of memory bound tests that should favor the M3 Max as well. It depends on how often you expect a user to encounter large vector workloads vs memory bound ones which is why I’ll repeat my refrain that for aggregate tests it’s always better to look at subtests than the average including for GB 5/6 above. The 7945HX is a bigger chip with more P-cores and more threads so one should expect it to perform well in MT tasks. However, the Int/FP tests are completely abstract (possibly so is the Extended test), incredibly lightweight (a criticism that can be applied to more than just the FP/Int tests), and the results make no sense. They are almost certainly the weakest part of the benchmark (though again a lot of the other benchmarks are probably too lightweight and abstract for the modern CPU to be a real test - again this a pretty archaic benchmark).
 
Last edited:
@dada_dave it could also be that passmark uses very simple testing code that has no real-world relevance. For example, if their benchmark invokes very long dependency chains, then it would favor cores with higher clocks. There is a striking discrepancy between the integer and prime number scores for example. This does not make much sense since both target the integer subsystem.
 
@dada_dave it could also be that passmark uses very simple testing code that has no real-world relevance. For example, if their benchmark invokes very long dependency chains, then it would favor cores with higher clocks. There is a striking discrepancy between the integer and prime number scores for example. This does not make much sense since both target the integer subsystem.
Yeah I noticed that as well - they say sieve of atkin (the Prime generator) is memory intensive with higher bandwidth CPUs doing better. Reading about it since I was unfamiliar with its inner workings that sounds reasonable. In fact, apparently segmenting by memory page is apparently optimal in memory usage (can also be worse in other respects). We see that the M3 Max does very well in this test but even so let’s hope if Passmark uses that segmented version, they didn’t hard code 4K page sizes!

For the Integer test, I really like the dependency chain idea as it would reduce ILP, something we know wider cores like M-series excel at, but then we’re still left with why the FP test result is so different? That I’m really struggling with.

As far as I can tell from the Int/FP/maybe Extended, you’re absolutely right on the money with the “very simple code with no real world relevance”. They’re literally just doing large amounts of operations on random numbers with no particular purpose or even algorithm.
 
As far as I can tell from the Int/FP/maybe Extended, you’re absolutely right on the money with the “very simple code with no real world relevance”. They’re literally just doing large amounts of operations on random numbers with no particular purpose or even algorithm.

It is not entirely meaningless, though. Programs do a lot of stuff like calculating addresses to find stuff and tearing bitfields out of packed records. Integer math is important housekeeping, so it needs to be efficient. That said, we are currently at a point where benchmark numbers are not all that meaningful. Unless you are talking 35% across the board, the differences in CPU performance are not measurable in real-world terms.
 
It is not entirely meaningless, though. Programs do a lot of stuff like calculating addresses to find stuff and tearing bitfields out of packed records. Integer math is important housekeeping, so it needs to be efficient. That said, we are currently at a point where benchmark numbers are not all that meaningful. Unless you are talking 35% across the board, the differences in CPU performance are not measurable in real-world terms.

Long dependency chains however are fairly uncommon. Modern cores can execute multiple such chains concurrently. Even code that looks sequential (for example, loops) can be run concurrently via speculative execution.
 
It is not entirely meaningless, though. Programs do a lot of stuff like calculating addresses to find stuff and tearing bitfields out of packed records. Integer math is important housekeeping, so it needs to be efficient.

Long dependency chains however are fairly uncommon. Modern cores can execute multiple such chains concurrently. Even code that looks sequential (for example, loops) can be run concurrently via speculative execution.
Yeah this is why I prefer benchmarks based on actual algorithms. Even something like qsort or finding Primes is actually doing something. Y-cruncher too! Even if you personally don't run those algorithms in particular those are real algorithms that represent the way some kinds of code will actually work. It's not that I think that Passmark shouldn't test the ability to do Integer workloads, of course they should, but programs usually calculate addresses and tear out bitfields and then do something with that information - they make decisions based on those computations, something! How the Int/FP tests are described are almost more akin to microbenchmarks than benchmarks. They are testing a very specific action that is generally folded in a larger algorithm - maybe good for calculating IPC ;), but not overall multithreaded throughput in a meaningful way.

That said, we are currently at a point where benchmark numbers are not all that meaningful. Unless you are talking 35% across the board, the differences in CPU performance are not measurable in real-world terms.

To agree with your "That said" with my own: that said, @Yoused I do roll my eyes when I see tech companies touting 5-10% improvements in performance over their competitors. That's well within run-to-run error for most of these benchmarks! And different benchmarks even purporting to test similar things can show wildly different behavior even in single thread never mind multithread. So I agree that benchmarking has its limitations for what the end user will actually notice. Until you run the code you care about, generalized benchmarks only give you a very vague sense of relative performance. Though all benchmarks have their limits, I'd still contend that's not an excuse to use ones with even greater limitations*! :)

*There is a rationale for running native but unoptimized benchmarks, think CB R23 but equally flawed for both x86 and ARM, but that’s a little different.
 
Last edited:
First benchmarks of the new Zen 5/5c mobile chips are out.

There is a lot we still don't know; hopefully that will get filled in in the next few days.

There is one thing we *do* know now, though. The Zen 5, while improved, did not overtake the M3 P core, much less the M4. In fact at reasonable thin/light laptop power, they're not even at M2 levels. (Maybe not at M1 level... I don't have the numbers handy and I'm too lazy to check.)

Is this enough to hold off QC? Need more data.
 
First benchmarks of the new Zen 5/5c mobile chips are out.

There is a lot we still don't know; hopefully that will get filled in in the next few days.

There is one thing we *do* know now, though. The Zen 5, while improved, did not overtake the M3 P core, much less the M4. In fact at reasonable thin/light laptop power, they're not even at M2 levels. (Maybe not at M1 level... I don't have the numbers handy and I'm too lazy to check.)

Is this enough to hold off QC? Need more data.
About to do more bubble charts! Bit of brain fog since I just got out surgery so maybe not too soon but data is here:


I’m going to focus on CB R24 again since it’s native to everyone.
 
Last edited:
There is one thing we *do* know now, though. The Zen 5, while improved, did not overtake the M3 P core, much less the M4. In fact at reasonable thin/light laptop power, they're not even at M2 levels. (Maybe not at M1 level... I don't have the numbers handy and I'm too lazy to check.)

At least we can be sort of confident that they will not slag themselves with over-juicing.
 
At least we can be sort of confident that they will not slag themselves with over-juicing.
Last year, AMD processors did have that problem, no where near to the same extent that Intel is experiencing (and mostly AMD's fault though at least partially the fault of board makers I think), but Zen 4 desktops were actually turning themselves to slag with excess voltage.

 
@NotEntirelyConfused (and everyone else of course! 🙃 ) Here's thumbnail of the findings for the new AMD chip, I just added the data to the previous graph (apologies it's a big chart and compression wasn't kind to the fuzziness of the text):

Screenshot 2024-07-30 at 10.06.18 AM.png


In addition to the NotebookCheck article, I'm also going to talk about the Anandtech article.

There are a few things to note here:

1) The new Zen 5 mobile P-cores on N4P represent a significant jump in ST performance compared to Zen 4. They are only 10% more efficient at the clock speed they are rated for (which is 200 MHz less than the max clocks of the Zen 4 mobile chips) however that's at better performance, implying that they could lower clockspeed even more and get more efficiency though they probably still wouldn't match the lower end Qualcomm cores never mind Apple cores. So single core efficiency is still way behind though at least they improved performance.

2) These Strix Point chips are manufactured on a slightly better node (N4P) than the Apple (N5P) or Qualcomm (N4) chips. Judging from this chart, the difference isn't huge: N4 looks like a slightly more dense N5P but with almost identical power/performance and N4P has similar density to N4 but either 6% better performance at the same power or 11% better efficiency at the same performance for the chip TSMC bases there reference calculations on. While that certainly helps the AMD chip here, I don't expect that to predominate especially as these are load-idle power figures for the whole device.

3) The HX 370 is a much bigger chip with 12 cores and 24 threads, 4 more cores than the 8845HS chip it replaces. True, 8 of the cores are now "c" cores but while they are a smaller and more efficient than the their non-c counterparts for multithreaded workloads I suspect the difference is minimal in this respect. In an interview with Tom's Hardware, Mike Clark states that the Zen 5c cores are about 25% smaller than the Zen 5 cores with most of what was taken out and rearranged having to do with silicon that allows the cores to boost to super high frequencies. Thus for the purposes of multithreaded workloads, especially endurance tests like CB, one might be tempted to say that they are effectively P-cores. However, they also have access to dramatically less cache, especially last level cache, than the standard P-cores and have high core-to-core latency especially with the standard P-cores as they have to communicate through the last level cache with the standard P-cores. This will almost certainly limit their performance relative to the standard P-cores and, in some ways, this is similar to what I suspect is limiting the Qualcomm chips in multithreaded workloads (i.e. cache and bandwidth).

4) These extra cores means a significant percentage of the uplift in multithreaded performance simply comes from having more cores available. For instance, jumping down to the 3D Mark CPU Profile Simulation HX 370 only gets a 5% jump over the previous gen chip (the 8845HS and 7840HS I believe have the same CPU) when both are restricted to 8 threads, but doubles its lead in the same benchmark at max threads. As shown in the graph above, for CB R24, its lead over its predecessor is more substantial. At 57 watts, it now gets the roughly same score (1022) as the Apple M2 Pro (1030, 60W) and Qualcomm X Elite 78 (1033, 62.1W) whereas the 8845HS was about 15-16% behind (842, 56.8W). The three newest chips all now have similar efficiencies at this point in their performance/watt curves. As stated in my previous post, it is unfortunate that we don't have X Elite 80s or 84s to compare here at this performance cure which are higher binned chips and might perform better. I think I saw a Techlinked video which stated that these are apparently more difficult to source for review sites. The Apple M3 Pro by contrast score around 1055. Unfortunately NotebookCheck didn't do wall power measurements for it under CB R24 (I suspect they don't have one anymore), but it's in the mid-40Ws for wall power. That would make it somewhere in the range of 25-30% more efficient at ISO-performance. The M3 Pro is on a slightly better node (N3B) but while transistor scaling increased dramatically for practical chips with increasing SRAM again difference in power/performance due to that alone aren't expected to predominate (a few percent compared to N4P). Overall, I suspect the M2 Pro, Qualcomm X Elite, and especially M3 Pro are a good deal more die efficient than the HX 370.

5) The caveats from my previous post are expected to hold here too.

6) Anandtech compared the HX 370 to the base M3. AMD amusingly asked them to compare to the M3 in the Air, which Anandtech demurred and tested the M3 in the MacBook Pro. As I think you can see from the above chart, for multithreaded applications even comparing to the base M3 at all regardless of device, especially without very well controlled power data, is also not a great idea. This is a chip which can be pushed to 120W wall power (although at such little performance gain at that point I can't think why anyone would - a measly 4% increase in performance costs a whopping 44% more power at this point in the curve!). Anandtech measured the power draw of the HX 370's 28W setting to be around 33W. I suspect this is platform power from HW Info but Anandtech didn't specify in the article. Notebookcheck measured a different device but same chip from Asus and found that it drew 48W wall power at the 28W setting and both got the similar CB R24 scores of 927-950 (Notebookcheck-Anandtech) so I suspect we're talking similar power draws. From what I can tell this is much more similar to the M3 Pro in powermetrics and wall power for a CPU-only test, which again scores around 1050 in CB R24 making ~10-14% more performant at ISO power. Meanwhile, the base M3's power metrics/wall power don't come even close to these figures under CPU load. Notebookcheck measured it in the Air and found it only drew 21W at the wall in the fanless design and got a score of about 600pts, Anandtech measured the M3 in the MacBook Pro to get 718, at presumably higher average power draw in an endurance test than the thermally limited Air, but almost certainly not anywhere near the same wattage as the HX 370 in its 28W TDP configuration.

7) The single threaded SPEC results in the Anandtech article are interesting and here you can see the AMD chip losing in INT performance to the M3 P-core but slightly beating it at FP performance. Without power measurements, not even software ones, we can't get a sense of efficiency but we can readily assume that the the M3 is far superior in that respect. Still, shows that indeed in ST, AMD is doing very well. Might be time to update ISO-clock graphs (I keep saying that).

8) Geekbench 6 results tells a similar story for the new AMD chips. Here's an example (2780/15267 ST/MT) for the same ASUS model as tested by NotebookCheck, unclear at what TDP. Geekbench single core for the M2 Pro is about (2663/14568), the M3 Pro is about (3179/18982), and the Qualcomm Elite 80 is (2845/14458). Here the 12 cores + SMT probably isn't helping AMD as much especially relative to the M3 Pro given that GB 6 switched some of its MT workloads to task based parallelism to cut down on the "MoAr cores is better" phenomenon.

Conclusion: The new AMD chips are naturally still far behind the Qualcomm Elite and Apple M2 P-core in ST efficiency, but performance has been much improved. This remains the weakest element to AMD's chips and it will likely take awhile for x86 processors to match it if they ever do. For MT, while AMD had to add more cores, and one suspects use much more silicon, to compete with the M2 Pro and Qualcomm Elite, the end user doesn't likely care how they did it, only that they now match them, even slightly beat them. This is particularly bad for Qualcomm since they need to be much better than AMD as compatibility issues still abound and the rest of the SOC is underwhelming. If Qualcomm's die is cheaper though, then they can compete on price and that advantage shouldn't be discounted! Also, I've seen mixed reports about the iGPU in the AMD chip (haven't had time to look at that in depth, will try to report if there is anything interesting), but the Qualcomm's was I think worse and of course the AMD Strix Point can still be paired with a dGPU (though that sorta defeats the purpose of the APU? I mean it's a sizable iGPU). For Apple, AMD may be matching the M2 Pro, but the current M3 generation remains comfortably ahead and of course the base M4 (3715/14690 GB 6 in the fanless iPad) is expected to come to the Mac at the end of the year and Pro/Max chips as well. Apple remains the ST and MT mobile performance-efficiency kings for now.
 
Last edited:
@NotEntirelyConfused (and everyone else of course! 🙃 ) Here's thumbnail of the findings for the new AMD chip, I just added the data to the previous graph (apologies it's a big chart and compression wasn't kind to the fuzziness of the text):

View attachment 30580


In addition to the NotebookCheck article, I'm also going to talk about the Anandtech article.

There are a few things to note here:

1) The new Zen 5 mobile P-cores on N4P represent a significant jump in ST performance compared to Zen 4. They are only 10% more efficient at the clock speed they are rated for (which is 200 MHz less than the max clocks of the Zen 4 mobile chips) however that's at better performance, implying that they could lower clockspeed even more and get more efficiency though they probably still wouldn't match the lower end Qualcomm cores never mind Apple cores. So single core efficiency is still way behind though at least they improved performance.

2) These Strix Point chips are manufactured on a slightly better node (N4P) than the Apple (N5P) or Qualcomm (N4) chips. Judging from this chart, the difference isn't huge: N4 looks like a slightly more dense N5P but with almost identical power/performance and N4P has similar density to N4 but either 6% better performance at the same power or 11% better efficiency at the same performance for the chip TSMC bases there reference calculations on. While that certainly helps the AMD chip here, I don't expect that to predominate especially as these are load-idle power figures for the whole device.

3) The HX 370 is a much bigger chip with 12 cores and 24 threads, 4 more cores than the 8845HS chip it replaces. True, 8 of the cores are now "c" cores but while they are a smaller and more efficient than the their non-c counterparts for multithreaded workloads I suspect the difference is minimal in this respect. In an interview with Tom's Hardware, Mike Clark states that the Zen 5c cores are about 25% smaller than the Zen 5 cores with most of what was taken out and rearranged having to do with silicon that allows the cores to boost to super high frequencies. Thus for the purposes of multithreaded workloads, especially endurance tests like CB, one might be tempted to say that they are effectively P-cores. However, they also have access to dramatically less cache, especially last level cache, than the standard P-cores and have high core-to-core latency especially with the standard P-cores as they have to communicate through the last level cache with the standard P-cores. This will almost certainly limit their performance relative to the standard P-cores and, in some ways, this is similar to what I suspect is limiting the Qualcomm chips in multithreaded workloads (i.e. cache and bandwidth).

4) These extra cores means a significant percentage of the uplift in multithreaded performance simply comes from having more cores available. For instance, jumping down to the 3D Mark CPU Profile Simulation HX 370 only gets a 5% jump over the previous gen chip (the 8845HS and 7840HS I believe have the same CPU) when both are restricted to 8 threads, but doubles its lead in the same benchmark at max threads. As shown in the graph above, for CB R24, its lead over its predecessor is more substantial. At 57 watts, it now gets the roughly same score (1022) as the Apple M2 Pro (1030, 60W) and Qualcomm X Elite 78 (1033, 62.1W) whereas the 8845HS was about 15-16% behind (842, 56.8W). The three newest chips all now have similar efficiencies at this point in their performance/watt curves. As stated in my previous post, it is unfortunate that we don't have X Elite 80s or 84s to compare here at this performance cure which are higher binned chips and might perform better. I think I saw a Techlinked video which stated that these are apparently more difficult to source for review sites. The Apple M3 Pro by contrast score around 1055. Unfortunately NotebookCheck didn't do wall power measurements for it under CB R24 (I suspect they don't have one anymore), but it's in the mid-40Ws for wall power. That would make it somewhere in the range of 25-30% more efficient at ISO-performance. The M3 Pro is on a slightly better node (N3B) but while transistor scaling increased dramatically for practical chips with increasing SRAM again difference in power/performance due to that alone aren't expected to predominate (a few percent compared to N4P). Overall, I suspect the M2 Pro, Qualcomm X Elite, and especially M3 Pro are a good deal more die efficient than the HX 370.

5) The caveats from my previous post are expected to hold here too.

6) Anandtech compared the HX 370 to the base M3. AMD amusingly asked them to compare to the M3 in the Air, which Anandtech demurred and tested the M3 in the MacBook Pro. As I think you can see from the above chart, for multithreaded applications even comparing to the base M3 at all regardless of device, especially without very well controlled power data, is also not a great idea. This is a chip which can be pushed to 120W wall power (although at such little performance gain at that point I can't think why anyone would - a measly 4% increase in performance costs a whopping 44% more power at this point in the curve!). Anandtech measured the power draw of the HX 370's 28W setting to be around 33W. I suspect this is platform power from HW Info but Anandtech didn't specify in the article. Notebookcheck measured a different device but same chip from Asus and found that it drew 48W wall power at the 28W setting and both got the similar CB R24 scores of 927-950 (Notebookcheck-Anandtech) so I suspect we're talking similar power draws. From what I can tell this is much more similar to the M3 Pro in powermetrics and wall power for a CPU-only test, which again scores around 1050 in CB R24 making ~10-14% more performant at ISO power. Meanwhile, the base M3's power metrics/wall power don't come even close to these figures under CPU load. Notebookcheck measured it in the Air and found it only drew 21W at the wall in the fanless design and got a score of about 600pts, Anandtech measured the M3 in the MacBook Pro to get 718, at presumably higher average power draw in an endurance test than the thermally limited Air, but almost certainly not anywhere near the same wattage as the HX 370 in its 28W TDP configuration.

7) The single threaded SPEC results in the Anandtech article are interesting and here you can see the AMD chip losing in INT performance to the M3 P-core but slightly beating it at FP performance. Without power measurements, not even software ones, we can't get a sense of efficiency but we can readily assume that the the M3 is far superior in that respect. Still, shows that indeed in ST, AMD is doing very well. Might be time to update ISO-clock graphs (I keep saying that).

8) Geekbench 6 results tells a similar story for the new AMD chips. Here's an example (2780/15267 ST/MT) for the same ASUS model as tested by NotebookCheck, unclear at what TDP. Geekbench single core for the M2 Pro is about (2663/14568), the M3 Pro is about (3179/18982), and the Qualcomm Elite 80 is (2845/14458). Here the 12 cores + SMT probably isn't helping AMD as much especially relative to the M3 Pro given that GB 6 switched some of its MT workloads to task based parallelism to cut down on the "MoAr cores is better" phenomenon.

Conclusion: The new AMD chips are naturally still far behind the Qualcomm Elite and Apple M2 P-core in ST efficiency, but performance has been much improved. This remains the weakest element to AMD's chips and it will likely take awhile for x86 processors to match it if they ever do. For MT, while AMD had to add more cores, and one suspects use much more silicon, to compete with the M2 Pro and Qualcomm Elite, the end user doesn't likely care how they did it, only that they now match them, even slightly beat them. This is particularly bad for Qualcomm since they need to be much better than AMD as compatibility issues still abound and the rest of the SOC is underwhelming. If Qualcomm's die is cheaper though, then they can compete on price and that advantage shouldn't be discounted! Also, I've seen mixed reports about the iGPU in the AMD chip (haven't had time to look at that in depth, will try to report if there is anything interesting), but the Qualcomm's was I think worse and of course the AMD Strix Point can still be paired with a dGPU (though that sorta defeats the purpose of the APU? I mean it's a sizable iGPU). For Apple, AMD may be matching the M2 Pro, but the current M3 generation remains comfortably ahead and of course the base M4 (3715/14690 GB 6 in the fanless iPad) is expected to come to the Mac at the end of the year and Pro/Max chips as well. Apple remains the ST and MT mobile performance-efficiency kings for now.
Great work.
 
@NotEntirelyConfused (and everyone else of course! 🙃 ) Here's thumbnail of the findings for the new AMD chip,[...]
Fantastic writeup. I was far too lazy to do that myself but I think you hit just about every point I wanted to make. It is very frustrating that we can't count on Anandtech to do this sort of thing any more. :-(

I especially want to call attention to your analysis of AMD vs. QC. This sounds VERY scary for QC, and while pricing is critical (as I've pointed out when comparing M3/M4 to SXE), I am dubious about whether or not QC can make up the differences there.

In particular, the iGPU issue seems critical. QC is taking it on the chin for both performance and compatibility, and even if they fix the latter, they're getting their clocks cleaned on the former.

It now seems to me even clearer that they've made a grievous error not pushing low power and fanless designs. They are not going to be able to compete with AMD at the high end, I think, but they might be able to do a fanless design. It would absolutely suck compared to an M3 (much less future M4), but it could fill a significant niche that AMD and Intel currently can't (or at least aren't).

The big question I haven't seen answered yet is, how big is the silicon? And how big are the 5 and 5c cores? As you say, this feeds into cost, which is likely to matter this time around.
 
The big question I haven't seen answered yet is, how big is the silicon? And how big are the 5 and 5c cores? As you say, this feeds into cost, which is likely to matter this time around.
An annotated die shot was just posted on an AnandTech forum, in between people noting that SK appears to have given up over there (I don't blame her). No measurements yet though.
 
Back
Top