M3 core counts and performance

Cmaier

Site Master
Staff Member
Site Donor
Joined
Sep 26, 2021
Posts
5,970
Main Camera
Sony
I’m hearing single-core performance in the next generation is around 130% of M2 (high performance cores), and that the low power cores are much more efficient. 130% would put apple back on their historical 20% year-over-year performance improvement, more or less.

Core count-wise it looks like:

M3: 8 CPU, 10 GPU
M3 Pro: 12 CPU, 18 GPU (vs. 10/16 M2 Pro)
M3 Max: 16 CPU (12 perf/4 effic), 40 GPU
M3 Ultra: (up to) 32 CPU (24 perf/8 effic), 80 GPU

Sources:

Should be very nice performance boosts across the board.
 
I wonder if they will be folding in SVE2 or just working those capabilities into separate chip subunits.
 
That differing no. of CPU cores between the Pro and the Max (12 & 16, respectively) would be a qualitative change to their product tiers since, up to this point, the Max and Pro have had identical CPU core counts (12 cores for both on the M2's, 10 for both on the M1's ) (though certain SKU's in the Pro line were/are available with a couple of those cores disabled: 8 cores on the low-end 14" M1 Pro MBP, and 10 cores on the M2 Pro Mini).

But since their top-end chip is limited to 2x Max for the forseeable future, it does make sense they'd want to make the Max bigger without forcing those same changes onto the Pro. Right now it's only $200 to go from the Pro to the Max if you need the Max just for the higher RAM. With the Max's higher CPU core counts, that may change.

Any word on whether the M3 is more scalable to higher clocks than the M1/2, and whether we'll be seeing that in the desktops?

And what do you make of "The MacBook Pro models that Apple is testing internally feature 36GB and 48GB of RAM"? Granted, this is Gurman, so it's speculative, but I'm wondering if it means that Apple is shifting to 12 GB modules uniformly, such that the RAM choices across the model line will now be 12/24/36/48/96/192

Also wondering about LPDDR5x, PCIe 5.0, and 120 Gbps TB5/USB4v2* (*hopefully they just call it USB5, but they do like their weird nomenclature).
 
Last edited:
Any word on whether the M3 is more scalable to higher clocks than the M1/2, and whether we'll be seeing that in the desktops?
I hear yes to the first question. I have no information on what will happen with desktops.


And what do you make of "The MacBook Pro models that Apple is testing internally feature 36GB and 48GB of RAM"? Granted, this is Gurman, so it's speculative, but I'm wondering if it means that Apple is shifting to 12 GB modules uniformly, such that the RAM choices across the model line will now be 12/24/36/48/96/192

No idea what to make of it.
 
Cliff, do you have this from your little birdies or is it all just Gurman info? In case of the former, does 130% reflect the IPC improvements alone or is it IPC+clock?
 
I don't think I read anything of the 130% that Cliff mentioned in the new items that referred to Gurman, and I somewhat doubt that Cliff would be using Gurman's information without backing it up with a second source.
When I read about the M3 rumors on Ars Technica, I almost posted the article here, but refrained from it, since they only referred to Gurman, and I consider him as an unreliable source.

But it's interesting if Apple manages to push single-thread performance again.
Also that there would be a bigger difference between Pro and Max than just the number of GPU cores.
And it seems they stick to the four efficiency cores now. I never understood why they went with 8+2 for the M1 Pro.
 
I don't think I read anything of the 130% that Cliff mentioned in the new items that referred to Gurman, and I somewhat doubt that Cliff would be using Gurman's information without backing it up with a second source.

I agree, just asking because the links he posted all circle back to Gurman.
 
But it's interesting if Apple manages to push single-thread performance again.

They have to if they are serious about performance-oriented mobile computing. At any rate, if they improve SP by 30% this should secure them a commanding lead for another year or two.
 
If we go by Gurman's numbers here, the configurations include 6E cores and 6-8P cores on the Pro, but 4E cores and 12P cores on the Max. I find that very peculiar.

Max does only go into devices with bigger thermal headroom than Pro, but the 16" Pro uniquely accepts both Pro and Max. It's got a good cooling solution, but if, in raising the core count, it was deemed infeasible to cool it if the increase came from P cores and thus they increased the number of E cores instead, can we expect the 16" MacBook Pro to properly deal with the thermal load of a 12P core Max chip? I mean, if they allow the fans to spin significantly louder anything is possible, but without doing so, I have doubts on that front. Which brings me to my second point; What's the point?

If you have a task that is bound to a few, coordinating threads and requires single-threaded power but can also be scaled up to some extend, P cores are great. But if you can scale to almost GPU like levels of parallelism, why not have many more efficient E cores? Intel seems to be having success with throwing many smaller cores at the problem rather than fewer big ones. So much that they are now moving on to giant Xeon chips only filled up with their E cores. - Doesn't seem like too bad an idea honestly. Instead of 12P+4E, why not 18E+8P - Would probably do better in multi-threaded tasks at any given power consumption, I'm imagining.

Thoughts?
 
If we go by Gurman's numbers here, the configurations include 6E cores and 6-8P cores on the Pro, but 4E cores and 12P cores on the Max. I find that very peculiar.

Where do you read it? I’m pretty sure 12 cores on M3 Pro means 8+4, like today.

Can we expect the 16" MacBook Pro to properly deal with the thermal load of a 12P core Max chip? I mean, if they allow the fans to spin significantly louder anything is possible, but without doing so, I have doubts on that front. Which brings me to my second point; What's the point?

Of course we can. The CPU clusters of M1/M2 Max use 40 watts on full load. Adding one more P-cluster will push it up to 60watts. The chassis will barely notice it. And one can reduce the power consumption further by dropping the clocks just a little bit. What you probably lose is the ability to run CPU+GPU at full speed simultaneously but nobody really cares about that.

If you have a task that is bound to a few, coordinating threads and requires single-threaded power but can also be scaled up to some extend, P cores are great. But if you can scale to almost GPU like levels of parallelism, why not have many more efficient E cores? Intel seems to be having success with throwing many smaller cores at the problem rather than fewer big ones. So much that they are now moving on to giant Xeon chips only filled up with their E cores. - Doesn't seem like too bad an idea honestly. Instead of 12P+4E, why not 18E+8P - Would probably do better in multi-threaded tasks at any given power consumption, I'm imagining.

Thoughts?

Apple doesn’t have the problem that Intel has. Intel’s cores are very fast but they also consume a lot of power. Apples cores are also fast but they consume much less power. We are talking about 20 watts for Intel and 5 watts for Apple to achieve practically the same level of performance. Since the limiting factor is the power dissipated by the chip, Intel can’t really be stacking P-cores, they would run out of power budget very quickly (additional factor is the die area and Intels P cores are very large). For Intel, incorporating a large amount of compact cores that are significantly slower but also use much less power is a way to achieve better aggregated throughtput with the same area and power budget. But Apples P-cores are not only smaller than Intels P-cores, they also consume less power than Intels E-cores for much better performance. And Apples E-cores are considerably slower. Intels strategy just doesn’t make sense for Apple, unless they go for faster E-cores. Anyway, fewer fast cores are always better than more slow cores for the same power and performance. More flexible, less coordination overhead, more cache reuse.
 
Where do you read it? I’m pretty sure 12 cores on M3 Pro means 8+4, like today.
Top article on MR right now - It has a table.
Of course we can. The CPU clusters of M1/M2 Max use 40 watts on full load. Adding one more P-cluster will push it up to 60watts. The chassis will barely notice it. And one can reduce the power consumption further by dropping the clocks just a little bit. What you probably lose is the ability to run CPU+GPU at full speed simultaneously but nobody really cares about that.
With greater fan spin, sure. But an increased wattage would mean greater fan spin. I'm not concerned about it overheating, more that a 100% CPU workload is going to sound like my 2014 did again. With 100% CPU workload, my M1 Max 16" is still rather quiet. I'd rather have that than 10-20% more performance. Waiting one more minute in a nice environment beats finishing faster in a bad environment. - At least with how fast it already is. To me. At the very least, I hope low power mode is more aggressive about down clocking if things get louder like this. Then I can choose a quieter environment like that and it's all good.
Apple doesn’t have the problem that Intel has. Intel’s cores are very fast but they also consume a lot of power. Apples cores are also fast but they consume much less power. We are talking about 20 watts for Intel and 5 watts for Apple to achieve practically the same level of performance. Since the limiting factor is the power dissipated by the chip, Intel can’t really be stacking P-cores, they would run out of power budget very quickly (additional factor is the die area and Intels P cores are very large). For Intel, incorporating a large amount of compact cores that are significantly slower but also use much less power is a way to achieve better aggregated throughtput with the same area and power budget. But Apples P-cores are not only smaller than Intels P-cores, they also consume less power than Intels E-cores for much better performance. And Apples E-cores are considerably slower. Intels strategy just doesn’t make sense for Apple, unless they go for faster E-cores. Anyway, fewer fast cores are always better than more slow cores for the same power and performance. More flexible, less coordination overhead, more cache reuse.
I mean, everything's a compromise and situational. If fewer faster cores were always better, we might as well put our entire die budget towards a single mega-core; It wouldn't scale very well I'm sure, but ey. Relative to power and die space I think Apple's E cores still deliver more performance than their P cores, just like on Intel chips, even if you're right that the exact relative differences are not quite the same, and the absolute perf/watt and perf/die size values make the situation very different as well, if you can already parallelise to 14 CPU cores, I'm sure your task benefits more from more cores than faster cores. - Of course if you're filling the cores with several distinct tasks that's not necessarily the case.
 
Cliff, do you have this from your little birdies or is it all just Gurman info? In case of the former, does 130% reflect the IPC improvements alone or is it IPC+clock?
The speed increase is net. Includes clock. May vary based on device it’s sitting in, of course.
 
If we go by Gurman's numbers here, the configurations include 6E cores and 6-8P cores on the Pro, but 4E cores and 12P cores on the Max. I find that very peculiar.
Where do you read it? I’m pretty sure 12 cores on M3 Pro means 8+4, like today.
Top article on MR right now - It has a table.

Said table...

M2M3
Pro10 or 12 CPU cores (6 or 8 high-performance and 4 energy-efficient)
16 or 19 GPU cores
12 or 14 CPU cores (6 or 8 high-performance and 6 energy-efficient)
18 or 20 GPU cores
Max12 CPU cores (8 high-performance and 4 energy-efficient)
30 or 38 core GPU cores
16 CPU cores (12 high-performance and 4 energy-efficient)
32 or 40 GPU cores
Ultra24 CPU cores (16 high-performance and 8 energy-efficient)
60 or 76 GPU cores
32 CPU cores (24 high-performance and 8 energy-efficient)
64 or 80 GPU cores
 
Top article on MR right now - It has a table.

Indeed. I also checked - that’s also what German writes. This is very odd to me. Pro and Max share the same die layout and design (the former is a chop of the latter), so I don’t understand how one could have four E cores and another one six. The number doesn’t make sense to me either. I am tempted to attribute this to a misinterpretation or a misprint.


With greater fan spin, sure. But an increased wattage would mean greater fan spin. I'm not concerned about it overheating, more that a 100% CPU workload is going to sound like my 2014 did again. With 100% CPU workload, my M1 Max 16" is still rather quiet.

I’m sure it still will be quiet. There is a lot of cooling capacity in that chassis.

I'd rather have that than 10-20% more performance.

What about 50% faster? :)

I mean, everything's a compromise and situational. If fewer faster cores were always better, we might as well put our entire die budget towards a single mega-core; It wouldn't scale very well I'm sure, but ey.

There is a caveat. What I wrote is under same power and performance conditions. That is, if you have two designs that deliver the same total compute throughput for the same power, with the only difference being the number of cores, the one with fewer cores will be better. Of course, things are never like that. You just can’t make one mega-core that consumes 100 watts for the same performance as 10 small cores a 10 watt each. That’s where different design compromises come into play. Intel is forced to compromise because their tech doesn’t scale, AMD scales better but they are also exploring more compact core layouts. Apple has more headroom so they don’t need to (for now at least)

Relative to power and die space I think Apple's E cores still deliver more performance than their P cores

Relative to power, sure, die space, not really. An E-core cluster is a bit larger than a single P core and delivers just a bit more performance. You could drop a P-core cluster and add 12 E-cores, but you are not making any performance wins, your design now scales worse for asymmetric workloads, and you have more agents you need to keep memory coherent. Hardly a good tradeoff. If Apples E-cores would deliver 50-60% of the P core, like Intel, it could be a different story.

if you can already parallelise to 14 CPU cores, I'm sure your task benefits more from more cores than faster cores. - Of course if you're filling the cores with several distinct tasks that's not necessarily the case.

Yes, and your last sentence is why fewer cores are better (assuming constant power/performance). For massively parallel work many small cores works out great (hence GPUs), but if your workload is asymmetric, you risk stalls and synchronization overhead.
 
What about 50% faster? :)
Everything has a tipping point ;)
There is a caveat. What I wrote is under same power and performance conditions. That is, if you have two designs that deliver the same total compute throughput for the same power, with the only difference being the number of cores, the one with fewer cores will be better. Of course, things are never like that. You just can’t make one mega-core that consumes 100 watts for the same performance as 10 small cores a 10 watt each. That’s where different design compromises come into play. Intel is forced to compromise because their tech doesn’t scale, AMD scales better but they are also exploring more compact core layouts. Apple has more headroom so they don’t need to (for now at least)
Right. I agree with that.
Relative to power, sure, die space, not really. An E-core cluster is a bit larger than a single P core and delivers just a bit more performance. You could drop a P-core cluster and add 12 E-cores, but you are not making any performance wins, your design now scales worse for asymmetric workloads, and you have more agents you need to keep memory coherent. Hardly a good tradeoff. If Apples E-cores would deliver 50-60% of the P core, like Intel, it could be a different story.

Yes, and your last sentence is why fewer cores are better (assuming constant power/performance). For massively parallel work many small cores works out great (hence GPUs), but if your workload is asymmetric, you risk stalls and synchronization overhead.
Agree with the above. That said, given that most very heavy workloads that do push the CPU to 100% tend to be the ones that do scale well with more cues, I still think it could be interesting to go with the concept of 'fill it to the brim with the smallest, most efficient cores you have'. On the flip side, I guess eventually that does just become a GPU if you take that mentality far enough
 
Very nice improvement!

I’m interested to know if any information about gpu advances have leaked. We have information on core counts, but not on the supposed new design they were supposed to have shipped with the A16. Will this new gpu core include ray tracing? It’s here where Apple Silicon seems to lag behind the competition the most.
 
What I am really exited about is the perspective of a thin and light laptop that can really deliver performance of a large workstation. If Cliffs notion of 30% higher single core, it would propel Apple far above the fastest x86 desktop CPUs around. And if the Max indeed comes with 12 P cores, it’s multi-core performance should be on par with 200W x86 desktops.

Agree with the above. That said, given that most very heavy workloads that do push the CPU to 100% tend to be the ones that do scale well with more cues, I still think it could be interesting to go with the concept of 'fill it to the brim with the smallest, most efficient cores you have'. On the flip side, I guess eventually that does just become a GPU if you take that mentality far enough

It’s also the question of the nature of your workloads. For example, I think Apple did something very start by moving the long vector engine off the CPU core. You sacrifice latency but typical work that benefits from long vectors is not latency sensitive anyway. And you don’t have to pay for the large and power-hungry data buses and registers inside the core itself. So Apple can have lean, fast extremely power efficient cores and still offer very fast vector hardware (which they can scale at will) while Intel is in process of abandoning 512-bit vectors because they have designed themselves into a corner.
 
Very nice improvement!

I’m interested to know if any information about gpu advances have leaked. We have information on core counts, but not on the supposed new design they were supposed to have shipped with the A16. Will this new gpu core include ray tracing? It’s here where Apple Silicon seems to lag behind the competition the most.

I would be shocked of no raytracing is included. Apple has published detailed patents that offer in-depth explanation of RT technology over a year ago. It’s not the kind of patents that explores an idea, they describe stuff at circuit level. I believe that the technology was ready last year, but they had to postpone it because of 3nm node delays.
 
I would be shocked of no raytracing is included. Apple has published detailed patents that offer in-depth explanation of RT technology over a year ago. It’s not the kind of patents that explores an idea, they describe stuff at circuit level. I believe that the technology was ready last year, but they had to postpone it because of 3nm node delays.
That all sounds great. M3 macs are going to be fantastic by the looks of it.
 
Back
Top