M3 core counts and performance

I'm confused about the number of Display Engines, since in each case those are equal to the number of external displays the chip supports. The total number of supported displays is one more than that, so did he miss one?
He did. Apple's name for it is DCP, and they seem to use two variants of it in each M-series chip. You always get one copy of a simplified, reduced-size DCP for the internal display of Macbooks. On top of that you get N copies of a full-size DCP capable of everything, where N is 1 on base Mx chips, 2 on Mx Pro, or 4 on Mx Max.

This is why, for example, on the M2 (non-Pro) Mac Mini, the maximum monitor config is one 6K @ 60 Hz and one 5K @ 60Hz, both over Thunderbolt. The 5K display's getting refreshed by the reduced-capability DCP (which would have been dedicated to the internal display of a Macbook if the Mini wasn't a Mini).

With the full-size DCPs, people trying to identify blocks can go looking for a large block which appears four times on Max, two on Pro, and once on the base chip. But the small DCP always has exactly one copy per die, so it's a little harder to differentiate from other singleton random blocks. Most people don't seem to bother trying to find it. I do recall seeing at least one M1 Pro or Max die shot with it highlighted, but I can't find it right now.
 
If this is actually the x86 version running under Rosetta, it’s mighty impressive
It isn’t Rosetta 2 since it is labeled: “Geekbench 6.2.1 Pro for macOS AArch64”. I’m not sure how the title of the test got set. Is there a feature that allows editing?
 
It isn’t Rosetta 2 since it is labeled: “Geekbench 6.2.1 Pro for macOS AArch64”. I’m not sure how the title of the test got set. Is there a feature that allows editing?
I was thinking the same, but the CPU being listed as VirtualApple made me doubt it. I guess it’s just a macOS VM as @Jimmyjames said 👍
 
It’s probably Geekbench for Mac running on a macOS virtual machine?
Yes. Which is why there is only a very slight slowdown. Apple's VMs are nearly 100% native performance. I would probably just run everything in an Apple VM if I could only use my AppleID. Alas, Apple disallows this making their Apple silicon MacOS VMs nearly worthless.
 
I have my M3 Pro!
Need to get everything set up and let it settle (iCloud syncing etc.) before running any tests.
But for now, look at this Speedometer 2.1 score 😍 (+100 points over A17 Pro, +180 points over M1)
 

Attachments

  • Screenshot 2023-11-09 at 17.46.23.png
    Screenshot 2023-11-09 at 17.46.23.png
    528.7 KB · Views: 32
I have my M3 Pro!
Need to get everything set up and let it settle (iCloud syncing etc.) before running any tests.
But for now, look at this Speedometer 2.1 score 😍 (+100 points over A17 Pro, +180 points over M1)
Very nice! Congratulations.
 
Getting ahead of myself a bit (impatient and excited!)

Ran a quick Cinebench single thread test to check clocks and CPU/SoC power measured by powermetrics
ST clock: 4.055GHz
CPU Power: ~4.5W

This is way lower than the core power number we saw reported for A17 at just 3.7-3.8GHz.

Interesting! 🤩

Immediate questions are:
1. Is my dumb ass misreading this? 😅
2. Is A17 Pro better than we thought? Did we put too much faith in the geekerwan review and the numbers reported by "spec-on-iOS"?
3. Any chance powermetrics is wrong? I have no reason to believe it's wrong, just curious if anyone doubts this number
 

Attachments

  • Screenshot 2023-11-09 at 18.26.30.png
    Screenshot 2023-11-09 at 18.26.30.png
    1.5 MB · Views: 40
Getting ahead of myself a bit (impatient and excited!)

Ran a quick Cinebench single thread test to check clocks and CPU/SoC power measured by powermetrics
ST clock: 4.055GHz
CPU Power: ~4.5W

This is way lower than the core power number we saw reported for A17 at just 3.7-3.8GHz.

Interesting! 🤩

Immediate questions are:
1. Is my dumb ass misreading this? 😅
2. Is A17 Pro better than we thought? Did we put too much faith in the geekerwan review and the numbers reported by "spec-on-iOS"?
3. Any chance powermetrics is wrong? I have no reason to believe it's wrong, just curious if anyone doubts this number
1. Seems correct to me, but perhaps more knowledgeable eyes should double check.
2. If 1 is correct, then 2 must be I think.
3. Bugs happen but my understanding is these are the tools Apple uses themselves, and partly what the os uses to base its power management on. Given their “relentless focus on power per watt”, I’d be surprised if
were wrong.
 
Getting ahead of myself a bit (impatient and excited!)

Ran a quick Cinebench single thread test to check clocks and CPU/SoC power measured by powermetrics
ST clock: 4.055GHz
CPU Power: ~4.5W

This is way lower than the core power number we saw reported for A17 at just 3.7-3.8GHz.

Interesting! 🤩

Immediate questions are:
1. Is my dumb ass misreading this? 😅
2. Is A17 Pro better than we thought? Did we put too much faith in the geekerwan review and the numbers reported by "spec-on-iOS"?
3. Any chance powermetrics is wrong? I have no reason to believe it's wrong, just curious if anyone doubts this number

1. Seems correct to me, but perhaps more knowledgeable eyes should double check.
2. If 1 is correct, then 2 must be I think.
3. Bugs happen but my understanding is these are the tools Apple uses themselves, and partly what the os uses to base its power management on. Given their “relentless focus on power per watt”, I’d be surprised if
were wrong.

The numbers are most likely correct however powermetrics has occasionally given odd results:


In single-threaded workloads, such as CineBench r23 and SPEC 502.gcc_r, both which are more mixed in terms of pure computation vs also memory demanding, we see the chip report 11W package power, however we’re just measuring a 8.5-8.7W difference at the wall when under use. It’s possible the software is over-reporting things here. The actual CPU cluster is only using around 4-5W under this scenario, and we don’t seem to see much of a difference to the M1 in that regard.

It should also be pointed out that the old version of Cinebench had trouble fully utilizing the core and reported lower performance and power. The new version seems to have fixed most of those issues so I wouldn't expect the same issue *but* you should run more tests to confirm - like GB or something "bursty" to test how far the single core can go. Exciting!
 
The numbers are most likely correct however powermetrics has occasionally given odd results:




It should also be pointed out that the old version of Cinebench had trouble fully utilizing the core and reported lower performance and power. The new version seems to have fixed most of those issues so I wouldn't expect a repeat *but* you should run more tests to confirm. Exciting!
Hmmm I’m not sure I’d trust the wall power usage. I think he’s measuring on a MacBook Pro in that review. Surely the battery would be interfering in that measurement?
 
Hmmm I’m not sure I’d trust the wall power usage. I think he’s measuring on a MacBook Pro in that review. Surely the battery would be interfering in that measurement?
Possible but unlikely. And you can see it doesn't do so for any of the other tests, including when it is drawing both more and less power. Further it doesn't jive with the powermetrics M1 Mini results for those ST tasks. I no longer have the post, but Andrei posted on Twitter powermetrics results for these tests, at least CB if not the SPEC 502.gcc_r. And the results were lower if I remember right for total package power. Most conclusively, he noted in that quote above that the wall power and cluster measurements were similar for the M1 mini. To be fair those are the only two times I've seen powermetrics give such weird results personally. So it could just be a fluke, there is still some possibility of battery interference, and it was package power not core power that was the issue. So again, @Aaronage 's results are probably fine. But more tests are always good. The rest of us can live vicariously through him. :)
 
Last edited:
Seems like it could be useful!
View attachment 27207
Excellent, I believe this is now similar to Nvidia. @leman noted this was one obstacle to doing like for like TFLOPs comparisons between Nvidia and Apple GPUs. AMD added the ability to do multiple FP32 instructions but the jury is out on how useful that level of ILP is to most graphics/GPU workflows. So they now report very high TFLOPs for their GPUs but that may not be as useful in practice.
 
Screen Shot 2023-11-09 at 11.18.01 AM.png


video confirms @leman 's test and of course the patent he dug up. Very cool, very, very cool.

EDIT: OHHHHHH ... that's how they did it and why the patent kept mentioning cache and DRAM and barely mentioned registers and why they call it Dynamic caching. I admit I only skimmed the patent, but they are now treating a cores registers as another cache. Huh ... that's ... wow ... that's a huge change.


Screen Shot 2023-11-09 at 11.23.27 AM.png


Whoa .. continuing on ... they've turned them all into cache! How the hell do they keep the register performance so good treating it as a cache?! What the hell. Man I hope someone deep dives into this GPU.
 
Last edited:
Damn ... I wish I was still on Twitter. @Jimmyjames can you send the link to this video to Ryan Smith on Twitter? I know they don't do deep dives like they used to, but even just a writeup of this video for their audience would be really cool to share. Most people aren't going to watch a 30 minute video, but an article they might read.
 
Damn ... I wish I was still on Twitter. @Jimmyjames can you send the link to this video to Ryan Smith on Twitter? I know they don't do deep dives like they used to, but even just a writeup of this video for their audience would be really cool to share. Most people aren't going to watch a 30 minute video, but an article they might read.
Done! I didn’t suggest he do a write up as that might be a little pushy, but just said he might be interested in it.
 
Back
Top