Apple: M1 vs. M2

mr_roboto · Dec 9, 2022

Andropov said:
If the benchmark turns out to be true, it'd mean +15.6% Single Core and +20.8% Multicore over the Mac Studio. The multicore score scales better than 8 x P core score should be due to either the 2 extra E-cores or the improvements in the µarch of the E cores on the A15. Maybe both. I'm saying should because the M1 Pro/Max had the E cores running at 2GHz (vs 1GHz on the regular M1) when under high load [source], and now that the M2 Pro/Max apparently has 4 E cores that design decision may have changed. Maybe the M2 Pro/Max E cores only go up to 1GHz, in which case the full difference in scores would be because the µarch improvement in those cores.

That article is a bit confusing - it doesn't make it totally clear that it only covers a subset of the system's behavior.

The context is that macOS won't schedule low-QoS (background) threads on P cores under any circumstance, even when there's enough background work to use 100% of all E cores. However, the opposite is not true. Higher-prio threads are preferentially scheduled on P cores, but when all P cores are occupied, macOS is allowed to run them on E cores.

When the E cluster is under 100% load, and that load consists exclusively of background work, M1's 4-core E cluster is software-capped at 1 GHz, but M1 Pro/Max's 2-core E cluster is allowed to run at the full 2 GHz. Presumably, Apple did this so that Pro/Max wouldn't suffer any regression in background compute throughput compared to the base M1.

But as soon as any higher-prio thread runs on an E core, the E cluster's frequency is uncapped. I played around with this on a M1 Air quite a bit. It's easy to make its E cluster to stay at 2 GHz indefinitely, even under sustained loads which heat the computer up and force its P cluster to throttle down.

Benchmarks like GB5 don't use low priority bands for their threads, as far as I know, so they don't measure the 1 GHz E-cluster behavior on base M1.

leman · Dec 9, 2022

mr_roboto said:
Yeah, M1 is the first generation. These are core designs shared with iPhone, and the yearly phone release cycle is a big cash cow for Apple, so a conservative approach makes sense. They would not have wanted the Mac projects to add much risk before they were fully committed to the Mac transition, and at kickoff time for the A14/M1 generation of Apple Silicon, they probably did not know yet whether they were fully committed.

I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.

For example, Maynard Handley has found some more newer Apple patents (https://patents.google.com/patent/US20220334997A1 https://patents.google.com/patent/US20220342588A1) that point to more aggressive use of multi-chip technology in the future. Some big things are likely coming.

dada_dave · Dec 9, 2022

mr_roboto said:
When the E cluster is under 100% load, and that load consists exclusively of background work, M1's 4-core E cluster is software-capped at 1 GHz, but M1 Pro/Max's 2-core E cluster is allowed to run at the full 2 GHz. Presumably, Apple did this so that Pro/Max wouldn't suffer any regression in background compute throughput compared to the base M1.

But as soon as any higher-prio thread runs on an E core, the E cluster's frequency is uncapped. I played around with this on a M1 Air quite a bit. It's easy to make its E cluster to stay at 2 GHz indefinitely, even under sustained loads which heat the computer up and force its P cluster to throttle down.

Benchmarks like GB5 don't use low priority bands for their threads, as far as I know, so they don't measure the 1 GHz E-cluster behavior on base M1.

Interesting! I was unaware of that latter behavior of the M1 E cores with priority threads, I assumed they were completely capped at 1GHz vs the M1 Pro/Max at 2GHz.

leman said:
I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.

For example, Maynard Handley has found some more newer Apple patents (https://patents.google.com/patent/US20220334997A1 https://patents.google.com/patent/US20220342588A1) that point to more aggressive use of multi-chip technology in the future. Some big things are likely coming.

I think he meant at the start of the design of the A14/M1 chip family which would’ve been years before the WWDC announcement. But even so I agree it’s not about commitment. Rather, being conservative in some aspects of their design for the first generation of larger SOCs probably eased some of the design issues, allowed for different development priorities, etc …

mr_roboto · Dec 10, 2022

leman said:
I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.

dada_dave said:
I think he meant at the start of the design of the A14/M1 chip family which would’ve been years before the WWDC announcement. But even so I agree it’s not about commitment. Rather, being conservative in some aspects of their design for the first generation of larger SOCs probably eased some of the design issues, allowed for different development priorities, etc …

Yep, that's what I was going for, worded poorly. I do think Apple was fully committed to transitioning the Mac when they kicked off A14/M1 development, just not fully committed to doing it with A14 generation AS. The start dates for those projects had to be so long before fall 2020. There's no way they could have had full confidence everything would be ready for Mac product launch on time. I would be astonished if they made no contingency plans for delaying the Mac AS launch to a later AS generation.

On the flip side, they would have planned the A14/M1 generation to de-risk both iOS devices and Mac. No severe rocking of the boat allowed. Never designed a P core targeted at a Fmax higher than what's appropriate for a phone or tablet before? Well, is that Fmax likely to be good enough for Mac? If so, kick that can down the road a little.

leman · Dec 10, 2022

mr_roboto said:
Yep, that's what I was going for, worded poorly. I do think Apple was fully committed to transitioning the Mac when they kicked off A14/M1 development, just not fully committed to doing it with A14 generation AS. The start dates for those projects had to be so long before fall 2020. There's no way they could have had full confidence everything would be ready for Mac product launch on time. I would be astonished if they made no contingency plans for delaying the Mac AS launch to a later AS generation.

On the flip side, they would have planned the A14/M1 generation to de-risk both iOS devices and Mac. No severe rocking of the boat allowed. Never designed a P core targeted at a Fmax higher than what's appropriate for a phone or tablet before? Well, is that Fmax likely to be good enough for Mac? If so, kick that can down the road a little.

Thanks for clarifying, I now better understand what you meant, and yes, I agree with you entirely.

This is also why I don't believe it makes much sense to draw far reaching conclusions about Apple's strategy just from the M1 and M2 families.

Yoused · Dec 10, 2022

Yoused said:
That is a difference of 6% between the lowest and highest P-core clock rates.

I should note that it is easy to lose perspective. That difference is like several hundred Mac Plusses (when you factor in the 8MHz clock on a 16-bit data bus with sixteen 32-bit registers) and just shy of the base clock of the first G3 iMac.

dada_dave · Dec 10, 2022

Yoused said:
Someone on ars observed that that reported model number seemed off

According to Macrumors, there are two new model numbers in the November Steam Survey - one of which is indeed 14,6 (and the other is 15,4 interestingly). So that’s additional support for 14,6 being a real model number.

Andropov · Dec 12, 2022

Cmaier said:
As for your second question, the variability comes from variability in each process step. Each mask layer has tolerances. For example, you need to align mask. So in step 1 say you use a mask to determine where photoresist goes. Then you etch. then you deposit metal. then you mask again so you can etch away some of the metal. But the new mask may not be perfectly aligned with where the first mask was. The tolerances are incredibly tight.

You are also doping the semiconductor. It’s impossible to get it exactly the same twice. The wafer has curvature to it (imperceptible to a human eye). So chips at the edges are a little different than chips in the middle. Etc. etc.

the dimensions and number of atoms we are talking about are so small that it’s hard to keep everything identical at all times. Small changes in humidity, slight differences in the chemical composition of etchants or dopants, maybe somebody sneezed in the clean room. So many things can affect the end result. Vertical cross-sections of wires are never the same on two-chips (if you look at them with a powerful-enough microscope). Etc. etc.

Oh I see. It's easy forget how close to the size of atoms this things are. Thanks!

theorist9 · Dec 12, 2022

dada_dave said:
According to Macrumors, there are two new model numbers in the November Steam Survey - one of which is indeed 14,6 (and the other is 15,4 interestingly). So that’s additional support for 14,6 being a real model number.

And, FWIW, back in June a developer named Pierre Blazquez claimed he found the model numbers 14,5, 14,6, and 14,7 in Apple code: https://appleinsider.com/articles/22/07/05/apple-is-preparing-three-new-mac-studio-models

dada_dave · Dec 12, 2022

theorist9 said:
And, FWIW, back in June a developer named Pierre Blazquez claimed he found the model numbers 14,5, 14,6, and 14,7 in Apple code: https://appleinsider.com/articles/22/07/05/apple-is-preparing-three-new-mac-studio-models

Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?

theorist9 · Dec 12, 2022

dada_dave said:
Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?

Sorry, no idea. I've never bothered to try to figure out their numbering system.

.

Andropov · Dec 12, 2022

dada_dave said:
Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?

Not necessarily. MacBook Pro M1 13" is MacBookPro17,1, and MacBook Pro M1 Pro/Max are MacBookPro18,X. BTW, the ID Mac14,7 is already in use: the 13" M2 MacBook Pro. No idea why they dropped the "Book" from the model ID.

theorist9 · Dec 12, 2022

Andropov said:
Not necessarily. MacBook Pro M1 13" is MacBookPro17,1, and MacBook Pro M1 Pro/Max are MacBookPro18,X. BTW, the ID Mac14,7 is already in use: the 13" M2 MacBook Pro. No idea why they dropped the "Book" from the model ID.

Ah, sorry, I wrote "14,7" when I should have written "14,8". I just corrected that in my post.

theorist9 · Oct 31, 2023

Cmaier said:
My theory was that image processing likely uses the math library, which is not optimized for M1.

I got my hands on an M1 Pro MacBook Pro, which allowed me to do more detailed testing, and was able to get a better idea of what was causing Mathematica's image processing to be so much slower on the M1 than on my 2019 iMac (i9-9900K). There were two specific commands that were responsible for the difference: Sharpen and Blur. Looking at just those by themselves, the iMac was 9x faster on Sharpen (3.5 s vs. 32.5 s), and 14x faster on Blur (2.3 s vs. 32.0 s).

Futher, I was able to do a detailed comparison of GB 6.2.1 SC subscores on the two machines. All were higher on the M1 than on the iMac, except for the Background Blur task, which was 15% lower (see screenshot). Things are improved with the M2, but its GB score for this task is still 6% lower than my iMac's.

Thus these image processing tasks, when done by the CPU, seem to represent an inherent challenge for AS. Interestingly, other programs also have issues with blur tasks on AS, which are fixed by enabling GPU hardware acceleration (something I've read is not built into MMA functions; and, of course, it wouldn't operate in GB's CPU benchmark):
https://github.com/brave/brave-browser/issues/26186

So could it be Apple didn't bother optimizing CPU-based image processing because it assumes those doing image processing will be using GPU accleration?

leman · Nov 1, 2023

theorist9 said:
Thus these image processing tasks, when done by the CPU, seem to represent an inherent challenge for AS. Interestingly, other programs also have issues with blur tasks on AS, which are fixed by enabling GPU hardware acceleration (something I've read is not built into MMA functions; and, of course, it wouldn't operate in GB's CPU benchmark):
https://github.com/brave/brave-browser/issues/26186

CPU code doing these things relies on high-throughput SIMD operations, and x86 CPUs have an advantage here simply because of their higher clock. Both Apple Silicon and any modern x86 CPU are capable of roughly 512-bit worth of SIMD operations per clock, but x86 is clocked higher.
https://github.com/brave/brave-browser/issues/26186

theorist9 said:
So could it be Apple didn't bother optimizing CPU-based image processing because it assumes those doing image processing will be using GPU accleration?

Achieving high SIDM throughout on the CPU is expensive, both in terms of power consumption and die area. You need wide SIMD units, high clocks, and fast caches to feed those units. If I remember correctly, Intel cores have roughly 3x higher cache bandwidth than Apple Silicon (which is not cheap!) but all this capability is unused unless running AVX512 code. Which is disabled on consumer chips anyway.

Apple focuses on power efficiency, so they choose a different implementation path. One consequence is that Apple Silicon has no chance competing on pure throughput-oriented SIMD tasks (remember the chess engine controversy? exactly). To compensate for this, Apple has a wide vector/matrix engine (AMX) hooked to the L2 cache directly, which is a much more power efficient way of doing throughput-oriented computing on the CPU. And yes, for image processing etc., they want you to use the GPU, which is much better suited for that task anyway.

Nycturne · Nov 1, 2023

leman said:
And yes, for image processing etc., they want you to use the GPU, which is much better suited for that task anyway.

And I think Affinity Photo shows just how well that can play out when you build an image editor with that in mind these days.

Citysnaps · Nov 1, 2023

Nycturne said:
And I think Affinity Photo shows just how well that can play out when you build an image editor with that in mind these days.

With that in mind, it will be interesting to see how Matlab's image processing toolbox evolves. Especially for people who like to home-grow their own tools.

Apple: M1 vs. M2

mr_roboto

Site Champ

leman

Elite Member

dada_dave

Elite Member

mr_roboto

Site Champ

leman

Elite Member

Yoused

up

dada_dave

Elite Member

Andropov

Site Champ

theorist9

Site Champ

dada_dave

Elite Member

theorist9

Site Champ

Andropov

Site Champ

theorist9

Site Champ

theorist9

Site Champ

leman

Elite Member

Nycturne

Elite Member

Citysnaps

Elite Member

Similar threads