M3 core counts and performance

OT but does anyone know if pc gpus sometimes exceed their TDP in the way cpus do?
While laptop GPUs might get complicated due to arbitrary power settings from OEMs and motherboards, my impression is that, overall, GPUs from AMD and Nvidia are relatively honest when it comes to TDP. Max power draw is roughly what it says on the tin on average. It might be possible to exceed that TDP for something like Furmark, but, in general the board tries to stay below its TDP rating.


Just spot checking a few of these GPUs playing a demanding game like Metro Exodus against their specs and they seem to correspond well.
 
Geekbench Browser has finally been updated with official M3 numbers. Looking at the averaged numbers, if the M3 Max was 0.83% faster it would have beat the M2 Ultra. So close! Do you think it's a coincidence? Or maybe a pre-determined performance target for the M3 Max CPU?

Similar thing happens for the GPU: if the M3 Max (40 cores) GPU were just 0.16% faster, it'd beat the M1 Ultra GPU. Just another coincidence?
 
I doubt that they final tuned performance to be about the same as M2U respectivelly M1U for just one benchmark. You can use Cinebench 2024 and M2U will be about 10% faster than 16-core M3M. On the other hand if you compare Blender M3M's GPU will be more powerfull than M1U.
They probably used SPEC2017 or some other sw for internal testing…

edit: It even looks like M1U Mac Studio, the GPU variants with 48 and 64 cores are in one line in Geekbench Metal.
 
Based very roughly on GB6 numbers, Apple has not improved IPC in the M series. Since the first A-series 64-bit processor (using old GB5 numbers), [ SingleCore / GHz ] improved by a factor of about 2.5 up to the M1. Since then, that number has been consistent, in the ~740 range in GB6 for all the M-series processors. This of course reflects P-core performance and cannot estimate E-core gains.

Compared to the latest Intel processor, that is still pretty good. Being generous, estimating a clock speed of 4.6GHz (midway between cruise and "turbo"), the rough Intel IPC estimate would be in the 670 range. However, GB6 SC is a short test that very likely runs mostly at the higher clock speed, which would put the number just a little over 510.

(These are obviously not literal IPC figures but broad estimates of what throughput looks like.)
 
Since the complete set of GB scores for the M3 series are now posted on Primate (thanks to @Andropov for letting us know), I decided to check the GPU (Metal) scaling. Relative to the 8-core score, the 40-core score is 75% of perfect scaling.

For comparison, the M2 is at 73% for 38 cores relative to 8 cores (if we use the 38-core score for the Studio)—or 69% if we use the 38-core score for the 16" MBP.

But the most interesting part will be how M3 Ultra scales, which we won't see until the M3 Studio is released.
1700418863008.png
 
Last edited:
Since the complete set of GB scores for the M3 series are now posted on Primate (thanks to @Andropov for letting us know), I decided to check the GPU (Metal) scaling. Relative to the 8-core score, the 40-core score is 75% of perfect scaling.

For comparison, the M2 is at 73% for 38 cores relative to 8 cores (if we use the 38-core score for the Studio)—or 69% if we use the 38-core score for the 16" MBP.

But the most interesting part will be how M3 Ultra scales, which we won't see until the M3 Studio is released.
View attachment 27367
Interesting data. If I remember, the scaling was even worse with Geekbench 5 Metal scores. At this point it’s difficult to know if the hardware doesn’t scale or if Geekbench doesn’t show the scaling accurately.
 
Interesting data. If I remember, the scaling was even worse with Geekbench 5 Metal scores. At this point it’s difficult to know if the hardware doesn’t scale or if Geekbench doesn’t show the scaling accurately.
You’d have to compare to AMD/Nvidia. Some of it at least was hardware/driver given that every generation of M-series has improved scaling.
 
You’d have to compare to AMD/Nvidia
With Geekbench 5, the issue was that the tests were so short, they didn’t trigger the Max or Ultra to fully power up all the gpu cores. I recall @leman investigating the issue. With Nvidia and Amd, they pretty much always power up their cores. I’m not saying that’s still an issue, but I also wouldn’t rule out something having an effect. It might be worthwhile comparing other benchmarks to see.
 
With Geekbench 5, the issue was that the tests were so short, they didn’t trigger the Max or Ultra to fully power up all the gpu cores. I recall @leman investigating the issue. With Nvidia and Amd, they pretty much always power up their cores. I’m not saying that’s still an issue, but I also wouldn’t rule out something having an effect. It might be worthwhile comparing other benchmarks to see.
That was me quoting Andrei and Ryan :) and was mostly about the Ultra. But even so we know that at least some of it was hardware/driver on Apple’s end given that overall the M-series has improved scaling. Although I have to admit some of the GB6 are … interesting. The M2 Pro outperforms the M3 Pro (bandwidth and extra core overwhelming ALU and dynamic cache?) and the M1 Max has better scaling than M2/M3 Max vs base Mx. Odd.

 
That was me quoting Andrei and Ryan :) and was mostly about the Ultra. But even so we know that at least some of it was hardware/driver on Apple’s end given that overall the M-series has improved scaling. Although I have to admit some of the GB6 are … interesting. The M2 Pro outperforms the M3 Pro (bandwidth and extra core overwhelming ALU and dynamic cache?) and the M1 Max has better scaling than M2/M3 Max vs base Mx. Odd.

Strange.

it seems that in the Metal tests, binned models are not distinguished from non-binned ones. The M2 Ultra mixes 76 core scores with 60 core ones.
 
Strange.

it seems that in the Metal tests, binned models are not distinguished from non-binned ones. The M2 Ultra mixes 76 core scores with 60 core ones.
Ah I just assumed these were all full chip results but if they’ve mixed them together, then this data becomes a lot less useful. The median average will depend on the mix of full and binned chips.
 
Ah I just assumed these were all full chip results but if they’ve mixed them together, then this data becomes a lot less useful. The median average will depend on the mix of full and binned chips.
Yes, the individual scores can be very useful, but the ”average” ones can be weird.
 
Strange.

it seems that in the Metal tests, binned models are not distinguished from non-binned ones. The M2 Ultra mixes 76 core scores with 60 core ones.
You want to look at Benchmark Charts⟶Mac Benchmark Chart⟶Metal (or OpenCL). That will give you the scores broken down by core count. That's where I got the M3 scores for my graph.

1700427465832.png


But even there, you can see some odd results. E.g., the M3 iMac's 8-core GPU score is lower than the two 8-core M2 results they've posted. Note this is not happening because the 8-core M3 result is a low outlier—see the excerpt, below, from the scaling graph I posted above; the 8-core M3 is at the left.

1700427775640.png




1700430461886.png
 
Last edited:
While the tests @Aaronage ran may not represent the GPU at its peak power levels, we know that according to his GPU powermetrics readings under Cinebench and Blender the Pro 18 core GPU required 17.5-19.5 Watts which would put the 40 core Max at 38-43W for those workloads. Now we may have to account for the difference that Andrei was reporting package power and I believe @Aaronage is reporting just GPU power (correct me if I’m wrong). Thus the two results may be less different than they first appear.

For just GPU power, it’s possible that CB24 really did only take up 33W. In which case it isn’t a very strenuous GPU test and even larger GPUs should show bad scaling with it. That hypothesis should be testable with data already available especially for Nvidia GPUs. Too late and too tired to do it now.

While all this may be different from the maximum possible power draw, I gotta think Blender with RT off has to be close. And a reading of ~45W GPU could entail over 50W of package which could in turn entail 60W wall. So the only really big discrepancy would be if CB24 on the Max chip really was 33W GPU and actually represented a maximum power draw. At least that last seems unlikely given @Aaronage’s results. There may still be some small discrepancies but unless @Aaronage was reporting package power I think we’re okay.
Sorry for the delay! Yep I took GPU power readings from powermetrics 👍

Edit: Haven’t forgotten about trying Xcode GPU profiling tools, just haven’t had time this weekend.
Also, tempted to dig into powermetrics a little more. Just have a nagging concern that I give it too much weight without fully understanding it.
 
Last edited:
It's kinda sad we don't have competent Mac hardware reviewers and no Max tech ain't one.

On the PC side and even consoles get better responses from hardware YouTubers. I just saw the new Steam Deck OLED videos from Digital Foundry and Gamer Nexus and man they ooze quality and depth of information.
 
It's kinda sad we don't have competent Mac hardware reviewers and no Max tech ain't one.

On the PC side and even consoles get better responses from hardware YouTubers. I just saw the new Steam Deck OLED videos from Digital Foundry and Gamer Nexus and man they ooze quality and depth of information.
The notebookcheck review @theorist9 linked to was reasonably good, at least I was impressed, but yes overall I strongly agree that we have a massive deficit.
 
You want to look at Benchmark Charts⟶Mac Benchmark Chart⟶Metal (or OpenCL). That will give you the scores broken down by core count. That's where I got the M3 scores for my graph.

View attachment 27371

But even there, you can see some odd results. E.g., the M3 iMac's 8-core GPU score is lower than the two 8-core M2 results they've posted. Note this is not happening because the 8-core M3 result is a low outlier—see the excerpt, below, from the scaling graph I posted above; the 8-core M3 is at the left.

View attachment 27373



View attachment 27374
Someone running GB while spotlight indexing? 🙃 Because yeah ... the 10-core M3s are higher than the 10-core M2s.


1700444282888.png


The M2/M3 Pro results are definitely odd.

1700444726867.png


Again my guess is there are some tests where the extra core and bandwidth outweigh dynamic cache and ALU improvements.
 
After posting the Apple CPU and GPU performance vs. power graphs, it occured to me it might be fun to digitize one of them and see what scaling they showed. I thus digitized the M3 CPU plot using https://plotdigitizer.com/app (the white dots indicate where I placed points):
1700448314917.png

As this curve is for a single core, all cores (thanks @Souko !), I associated the highest performance score with the 3.6 GHz all-core clockspeed given by notebookcheck.net ( https://www.notebookcheck.net/Apple...ormance-and-improved-efficiency.766789.0.html ), and scaled all other performance scores correspondingly, to translate them into clock speeds as well. Since the M3 is 4P+4E, I took a coarse-grained approach and treated the efficiency cores as not contributing significantly to the power consumption, and thus divided by four to estimate the per-core power. I then transposed the data into the form power vs. clock speed (i.e., switching the x and y axes), and fit it to a three-parameter polynomial. Finally I compared this curve to the one I fit using the data @leman collected for the A17 performance core.

Apple's wattage is lower, but the curve is steeper. The best fit I got for leman's data was with a polynomial of the form a + bx^2+cx^6, while the best fit for Apple's curve was obtained using one of the form a + bx^2+cx^8.

[Note this doesn't mean the scaling goes as x^6 and x^8, respectively. It's nowhere near that strong, as the value of c is far smaller than b, in both cases. Specifically, based on the relative sizes of the coefficients, the scaling has 99.74% x^2 character and 0.26% x^6 character for leman's, and 99.99% x^2 and 0.01% x^8 for Apple's.]

Here's a plot comparing the two:

1700474793753.png

Here are the extrapolated values:

4.06 GHz (maximum M3 SC clock)
leman's A17 data: 6.5 W
Apple's curve: 6.3 W

4.446 GHz (10% increase in clock over 4.06 GHz)
leman's A17 data: 9.3 W (43% increase in power consumption)
Apple's curve: 10.4 W (65% increase consumption)

So it seems a 10% increase in clock speed would cost us an ≈50% increase in power consumption. Still, the added wattage could certainly be handled by an M3 Studio (hint, hint Apple :D).

1700475788740.png
 
Last edited:
Back
Top