Apple: M1 vs. M2

Yeah, the i9-13900KF is definitely a hackintosh.

The new benchmark (1889 Single Core and 14586 Multi Core) makes a lot more sense. It's interesting to see that (if true) they've gone with 4 E-cores this time. I like those cores a lot, they don't often get the praise they deserve.
The e-cores are amazing. Nobody else seems to have e-cores that are anywhere near it in perf/watt.
 
A question I raised over on the other site, where we were discussing whether Apple might be willing to do something different for its desktop machines:

It seems the only reason AMD and Intel can beat Apple in SC desktop speeds is because they offer a much larger percentage "turbo boost" over their base clocks than the M-series chips do—93% for the i9-13900K and 27% for the Ryzen 9 7950X, compared with 7% for the M1 (based on https://www.anandtech.com/show/17024/apple-m1-max-performance-review )

Assuming the M2's turbo boost (max clock/base clock for P-cores) is the same 7% as the M1's, here's what the top chips from the big three would look like if they all had the same 7% boost as the M2:

SC GB scores (assuming linear relationship between SC score and clock speed)
i9-13900K: 1,230 @ 3.2 GHz
AMD Ryzen 9 7950X: 1,730 @ 4.8 GHz
M2: 1,900 @ 3.5 GHz

Here's how the M2 would compare to the actual Intel and AMD chips if we allowed it a 27% boost
i9-13900K: 2,227 @ 5.8 GHz
AMD Ryzen 9 7950X: 2,192 @ 5.7 GHz
M2: 2,250 @ 4.2 GHz

So why couldn't Apple implement a 27% boost over their base clock, like AMD does? Are their cores not designed to handle the needed increase in voltage? And, if so, could they be?

Assuming that power is quadratic with clock speed, this would increase power consumption for the turboed core by ~40% over what's currently used. I don't know what the max watts per core is for the M2's P-cores, but if it's, say, 5 W, then that would only be another 4 W to allow two P-cores to be boosted to 4.2 GHz, which seems insignificant for a desktop. If it's cubic, it's an additional 8 W for two cores. Granted, it could be exponential or follow some other functional form..

We can do the same calculation for the M3 on N3. The clock speed increased by 9.4% from A15 to A16, so I'll use the same % increase for M2 to M3. Then if we add a 7.5% increase in performance for going from N4P->N3, we get:

M3: 2,650 :p @ 4.6 GHz
 
Last edited:
A question I raised over on the other site, where we were discussing whether Apple might be willing to do something different for its desktop machines:

It seems the only reason AMD and Intel can beat Apple in SC desktop speeds is because they offer a much larger percentage "turbo boost" over their base clocks than the M-series chips do—93% for the i9-13900K and 27% for the Ryzen 9 7950X, compared with 7% for the M1 (based on https://www.anandtech.com/show/17024/apple-m1-max-performance-review )

Assuming the M2's turbo boost (max clock/base clock for P-cores) is the same 7% as the M1's, here's what the top chips from the big three would look like if they all had the same 7% boost as the M2:

SC GB scores (assuming linear relationship between SC score and clock speed)
i9-13900K: 1,230 @ 3.2 GHz
AMD Ryzen 9 7950X: 1,730 @ 4.8 GHz
M2: 1,900 @ 3.5 GHz

Here's how the M2 would compare to the actual Intel and AMD chips if we allowed it a 27% boost
i9-13900K: 2,227 @ 5.8 GHz
AMD Ryzen 9 7950X: 2,192 @ 5.7 GHz
M2: 2,250 @ 4.2 GHz

So why couldn't Apple implement a 27% boost over their base clock, like AMD does? Are their cores not designed to handle the needed increase in voltage? And, if so, could they be?

Assuming that power is quadratic with clock speed, this would increase power consumption for the turboed core by 45% over what's currently used. I don't know what the max watts per core is for the M2's P-cores, but if it's, say, 5 W, then that would only be another 5 W to allow two P-cores to be boosted to 4.2 GHz, which seems insignificant for a desktop.

We can do the same calculation for the M3 on N3. The clock speed increased by 9.4% from A15 to A16, so I'll use the same % increase for M2 to M3. Then if we add a 7.5% increase in performance for going from N4P->N3, we get:

M3: 2,650 :p @ 4.6 GHz

You can’t just increase the voltage and expect everything to work beyond whatever the max design frequency is. Bunch of reasons. First, not everything scales with the voltage. The transistor IV curves aren’t linear. The waveforms at the output of gates don’t scale the same as the waveforms at the end of wires. Some gates will speed up by more than others as you increase voltage - this can cause multiple problems, including exacerbating cross-coupling (because the transition on one wire can be more than 2x faster than the opposite transition on a neighboring wire, for example, which can inject enough noise to trigger a false transition on the slower wire, or to slow it enough to break the path.). You can cause “hold time” violations where the result at the input of a latch doesn’t hold its value for long enough to be captured on the clock transition. Etc.

When we design the chip we model all the gates and the wires, and pick a couple of corners to run at (a “corner’ being a voltage, a set of process characteristics, etc.). We run analyses for setup times and hold times (i,e. Max delays and min delays) to figure out if the chip will work, and at what speed,. If we want to guarantee that it will run at a certain boost speed, we have to put in the effort to do that.

All that is to say, Apple just decided not to do what you are proposing (so far) :) There may also be thermal and other limitations - you may need a different package, both for thermal reasons and to bring in enough power and ground connections to handle the increased current, etc.
 
You can’t just increase the voltage and expect everything to work beyond whatever the max design frequency is. Bunch of reasons. First, not everything scales with the voltage. The transistor IV curves aren’t linear. The waveforms at the output of gates don’t scale the same as the waveforms at the end of wires. Some gates will speed up by more than others as you increase voltage - this can cause multiple problems, including exacerbating cross-coupling (because the transition on one wire can be more than 2x faster than the opposite transition on a neighboring wire, for example, which can inject enough noise to trigger a false transition on the slower wire, or to slow it enough to break the path.). You can cause “hold time” violations where the result at the input of a latch doesn’t hold its value for long enough to be captured on the clock transition. Etc.

When we design the chip we model all the gates and the wires, and pick a couple of corners to run at (a “corner’ being a voltage, a set of process characteristics, etc.). We run analyses for setup times and hold times (i,e. Max delays and min delays) to figure out if the chip will work, and at what speed,. If we want to guarantee that it will run at a certain boost speed, we have to put in the effort to do that.

All that is to say, Apple just decided not to do what you are proposing (so far) :) There may also be thermal and other limitations - you may need a different package, both for thermal reasons and to bring in enough power and ground connections to handle the increased current, etc.
I figured this was the right place to ask ;).
 
Just a personal opinion, but from the responses over at the other place there are quite a few folks disappointed at these geekbench numbers.

Personally, with an m2 pro/max/bodacious/ultra release - I’m more interested to see what other new IP accelerators/co-processors that Apple includes with the next release. The types of things that don’t really tell much of a story on cross platform benchmarking software but in real productive software makes a tangible benefit your workflow benefits.


Certainly I’m looking forward to a beefier neural engine, beefier GPU and I’d welcome AV1 media engine hardware encode/decode.
 
Just a personal opinion, but from the responses over at the other place there are quite a few folks disappointed at these geekbench numbers.

Personally, with an m2 pro/max/bodacious/ultra release - I’m more interested to see what other new IP accelerators/co-processors that Apple includes with the next release. The types of things that don’t really tell much of a story on cross platform benchmarking software but in real productive software makes a tangible benefit your workflow benefits.


Certainly I’m looking forward to a beefier neural engine, beefier GPU and I’d welcome AV1 media engine hardware encode/decode.
I agree but Apple used also include huge CPU increases as well. It can't be a coincidence that they slowed down when Gerard left. A16 CPU.
 
It seems the only reason AMD and Intel can beat Apple in SC desktop speeds is because they offer a much larger percentage "turbo boost" over their base clocks than the M-series chips do—93% for the i9-13900K and 27% for the Ryzen 9 7950X, compared with 7% for the M1 (based on https://www.anandtech.com/show/17024/apple-m1-max-performance-review )
Is Apple even prioritizing improving the single core performance for desktops? How important is it? I've always defended that a high single core score is a very relevant benchmark for many daily tasks, where a significant portion are —sadly— single threaded. This is particularly important on a phone, for example. But high-end desktops are often bought for a different set of tasks, where most of the workload is expected to be multithreaded. Otherwise, why bother with a 28-core machine?

Take the last Intel Mac Pro for instance. Its single core score barely beats 1100 single core Geekbench points. This is because at a time when Intel had homogeneous CPU designs, SC performance was traded off for MC performance. Lower clocks allowed for more cores running at the same time, which was ultimately deemed more important. I think we all agree that this was suboptimal. Single core performance may not be your highest priority on a 28-core CPU, but it sucks that a +$2500 CPU has half the single core performance of many contemporary CPUs at a fraction of the price when you do need it.

Heterogeneous CPU designs —both Apple's and Intel's— do not need to make this tradeoff. Single core performance is consistent across the board. So when you do need to launch a single core task on a high-core count CPU, performance is not abysmal. On the Intel side of things, it's actually even slightly better than on cheaper CPUs. Now, is this an important benchmark to be optimizing for on high-core count CPUs, other than for bragging rights? I honestly don't know.

It seems like Apple is doing fine with the multicore scores. The M2 Max, at 14,586 (leaked) points, beats the competition on the laptop space (i9-12900HK: ~13,200 points). The M2 Ultra, which should get about 25,200 points, would also beat the i9-13900K at 24,189 points. The 4-die M2 should get ~50,000 points. At this point, is it relevant whether a single core is scoring 1,800 or 2,200 points?

I know some 'Pro' workflow tasks are still single-thread bound. Many Photoshop filters, for example, are still single thread (I think). But is this the average target audience of the Mac Pro/Studio? I'm not asking if more SC performance would be useful or whether the current performance is "enough" (it's never enough). I'm asking whether it makes economical sense to optimize for single core scores on a CPU designed for massively parallel workloads.
 
It seems the only reason AMD and Intel can beat Apple in SC desktop speeds is because they offer a much larger percentage "turbo boost" over their base clocks than the M-series chips do
Infinitely: M-series chips do not offer any "turbo boost". They run at the speed that they run and they get the job done.
 
I agree but Apple used also include huge CPU increases as well. It can't be a coincidence that they slowed down when Gerard left. A16 CPU.
Yes it can. This time they didn’t have a process shrink. And many times over the past years they had similar gains - they average 20 percent single core improvement since A5, but that doesn’t mean they got 20 percent every year.
 
For reference, I updated the graph of the Geekbench scores of the last 7 years of AX chips:

Screenshot 2022-12-04 at 16.27.02.png


Note that for a sustained X% YoY improvement the bar graph should look exponential, not linear.
 
Infinitely: M-series chips do not offer any "turbo boost". They run at the speed that they run and they get the job done.
The anandtech article I linked, by Andrei Frumusanu, indicates they do indeed offer a turbo boost: The M-series offer a higher P-core clock if only one core is running, just as with AMD and Intel. That's all turbo boost is--it allows a higher speed than the all-core base clock if not all cores are running. The qualitative difference is that, with the M-series, this boost is per cluster:

"The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz."
https://www.anandtech.com/show/17024/apple-m1-max-performance-review
 
Last edited:
Is Apple even prioritizing improving the single core performance for desktops? How important is it? I've always defended that a high single core score is a very relevant benchmark for many daily tasks, where a significant portion are —sadly— single threaded. This is particularly important on a phone, for example. But high-end desktops are often bought for a different set of tasks, where most of the workload is expected to be multithreaded. Otherwise, why bother with a 28-core machine?

Take the last Intel Mac Pro for instance. Its single core score barely beats 1100 single core Geekbench points. This is because at a time when Intel had homogeneous CPU designs, SC performance was traded off for MC performance. Lower clocks allowed for more cores running at the same time, which was ultimately deemed more important. I think we all agree that this was suboptimal. Single core performance may not be your highest priority on a 28-core CPU, but it sucks that a +$2500 CPU has half the single core performance of many contemporary CPUs at a fraction of the price when you do need it.

Heterogeneous CPU designs —both Apple's and Intel's— do not need to make this tradeoff. Single core performance is consistent across the board. So when you do need to launch a single core task on a high-core count CPU, performance is not abysmal. On the Intel side of things, it's actually even slightly better than on cheaper CPUs. Now, is this an important benchmark to be optimizing for on high-core count CPUs, other than for bragging rights? I honestly don't know.

It seems like Apple is doing fine with the multicore scores. The M2 Max, at 14,586 (leaked) points, beats the competition on the laptop space (i9-12900HK: ~13,200 points). The M2 Ultra, which should get about 25,200 points, would also beat the i9-13900K at 24,189 points. The 4-die M2 should get ~50,000 points. At this point, is it relevant whether a single core is scoring 1,800 or 2,200 points?

I know some 'Pro' workflow tasks are still single-thread bound. Many Photoshop filters, for example, are still single thread (I think). But is this the average target audience of the Mac Pro/Studio? I'm not asking if more SC performance would be useful or whether the current performance is "enough" (it's never enough). I'm asking whether it makes economical sense to optimize for single core scores on a CPU designed for massively parallel workloads.
I would distinguish three categories rather than two. For machines at the highest end, we have:

1) Laptops, where SC performance is traded off against battery life and portability
2) Desktops, which afford ultimate SC performance (and can also offer higher core counts than laptops).
3) Workstations, where SC performance is traded off against core count/MT performance.

There are people doing serious work for whom #2 (ultimate SC performance) is important. They don't need mobility, and they don't need 20+ cores; they just want a machine that is as responsive* as possible, and that will complete their single-threaded tasks with less wait time. A lot of scientific programs, like Mathematica, are mostly single-threaded. As to whether Apple is prioritizing improving SC performance for that category, the answer is clearly that it hasn't thus far, and that's what the concern is -- that they've done a superb job producing chips optimized for categories 1** and 3***, but that producing chips optimized for cateogry 2 requires a different design, and that desktops aren't a big enough market share for them to care about.

*With my 2019 i9 iMac (measured GB SC = 1375), I routinely get delays (sometimes accompanied by a beachball) of 1 – 3 s when using Word, Excel, PowerPoint, and Acrobat Pro. I'd like those delays to be imperceptible, from which I infer the point of diminishing returns would be SC performance about 10x faster (=> 0.1 s – 0.3 s delays), i.e., GB5 SC ~10,000 (with enough increase in RAM and SSD speeds so those don't become bottlenecks). Of course I'm not going to get that, but this illustrates how much room for user-noticeable improvement there is. Plus you've got the significantly longer wait times with scientific programs, like Mathematica, where a series of calculations can take many seconds to many minutes; that's noticeable, because Mathematica is often used interactively.

**high portability while maintaining high SC performance

***high core counts while maintaining high SC performance
 
Last edited:
they do indeed offer a turbo boost: The M-series offer a higher P-core clock if only one core is running … "The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. …

That is a difference of 6% between the lowest and highest P-core clock rates. Rocket Lake (13000 series) by contrast has a turbo boost rate of about +90% (close to double) for the i9 P-cores and +115% for i9 E-cores. It makes the M-series clock rate look essentially flat.

Of course, the M-series processors are not really designed for high-clock performance and generate nearly equivalent SC scores at clock rates lower than x86 base clock rates. Real "turbo boost" would be of minimal advantage to M-series processors.
 
That is a difference of 6% between the lowest and highest P-core clock rates. Rocket Lake (13000 series) by contrast has a turbo boost rate of about +90% (close to double) for the i9 P-cores and +115% for i9 E-cores. It makes the M-series clock rate look essentially flat.

Of course, the M-series processors are not really designed for high-clock performance and generate nearly equivalent SC scores at clock rates lower than x86 base clock rates. Real "turbo boost" would be of minimal advantage to M-series processors.
In my original post I clearly defined "turbo boost" as max clock/base clock (and specifically max P/base P for the M-series), and found that it was 7% for the M1 based on Anandtech's figures: 3.2 GHz/3.0 GHZ = 1.07 (yes, it's more exactly 6% if you don't round).

You responded with this absolutist (and incorrect) statement:
Infinitely: M-series chips do not offer any "turbo boost".

Then, when I pointed out it was incorrect, you responded (above) by essentially repeating what I said to start with--that the boost of the M-series is much smaller than AMD's and Intel's—including reporting, within rounding, the same figures I did (~6-7% for M-series, ~90% for Intel):
It seems the only reason AMD and Intel can beat Apple in SC desktop speeds is because they offer a much larger percentage "turbo boost" over their base clocks than the M-series chips do—93% for the i9-13900K and 27% for the Ryzen 9 7950X, compared with 7% for the M1 (based on https://www.anandtech.com/show/17024/apple-m1-max-performance-review )

I know you tend to like to argue with my posts, which is fine—but there should be at least some basis for the argument, and I'm not seeing the logic here.
 
Last edited:
Yes it can. This time they didn’t have a process shrink. And many times over the past years they had similar gains - they average 20 percent single core improvement since A5, but that doesn’t mean they got 20 percent every year.
Yes, you maybe right. I guess I need to wait and see a 3nm A series chip
 
My question is because the stats you posted say USBx2?
14” has 2, I guess, while the 16” has 3.

The base M1/M2 dies have fewer USB/TB controllers than the Pro/Max dies (2 vs 4 I believe). 14” and 16” MBPs both have 3 USB/TB ports. The M1 Mac Mini uses an external USB controller to get more USB ports beyond the two driven by the TB controllers.

The Mac Studio with the M1 Max has 4 TB buses, hooked up to the rear ports, and a USB controller(s?) driving the front USB-C ports and the rear USB-A ports. The M1 Ultra has 8 TB buses, and so the front USB-C ports are TB-capable.
 
Back
Top