M1 Pro/Max - additional information

Cmaier

Site Master
Staff Member
Site Donor
Joined
Sep 26, 2021
Posts
6,119
Main Camera
Sony
From: https://www.anandtech.com/show/1702...cialflow&utm_source=twitter&utm_medium=social

Some highlights

  • GPU running at 1296 MHz (max). That means it has a very high "IPC", since many cards with comparable performance run at much higher frequency
  • 512-bit wide LPDDR5
  • 48MB SLC cache
  • Cores: 3.2GHz peak, 128KB L1D (3 cycle load-load latency), 12MB L2 cache
  • 15ns slower DRAM latency as compared to M1
  • Single core can saturate up to 102GB/s memory -- 2 cores 186 GB/s -- 3 cores 224 GB/s -- 4 cores 243GB/s, which is maximum the CPU cores can stress. So the CPU cannot, itself, use all 400+GB/s.
  • Power usage all over the map. 0.2 W at idle, 34 W CineBench r23 MT, 92 W Aztec High Off + 511.povray_rMT.
    • In all cases, less than Intel i9-11980HK, often much less, while achieving comparable-to-much-higher performance.
  • As expected, single thread performance comparable to M1
  • Multicore: Generally trounces AMD Ryzen 5980HS (35W) and Intel Core i9-11980HK (45 W)
  • On specfp memory-bound tests (which, I know from experience, is something cpu designers think about), it's performance is "absolutely absurd."

Conclusion:

"On the CPU side, doubling up on the performance cores is an evident way to increase performance – the competition also does so with some of their designs. How Apple does it differently, is that it not only scaled the CPU cores, but everything surrounding them. It’s not just 4 additional performance cores, it’s a whole new performance cluster with its own L2. On the memory side, Apple has scaled its memory subsystem to never before seen dimensions, and this allows the M1 Pro & Max to achieve performance figures that simply weren’t even considered possible in a laptop chip. The chips here aren’t only able to outclass any competitor laptop design, but also competes against the best desktop systems out there, you’d have to bring out server-class hardware to get ahead of the M1 Max – it’s just generally absurd."
 
So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?

I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.

Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.
 
So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?

I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.

Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.

Well, benchmarks are an indication. The better the benchmark, the better it correlates to real work.

My wife has an M1 MacBook, and it is, in real use, much faster than my 2016 MBP at everything, just like the benchmarks would predict. These new MBPs are much much faster. Whether that translates into a real advantage for you depends on what you are doing. For things like video editing, for example, the difference will be readily apparent, and you’ll be able to do things like edit many more simultaneous streams. If you’re running Word or a web browser, the difference will largely be that the machine will be silent (unlike the competition), and have much longer battery life. But the speed won’t do much for you.
 
Well, benchmarks are an indication. The better the benchmark, the better it correlates to real work.

My wife has an M1 MacBook, and it is, in real use, much faster than my 2016 MBP at everything, just like the benchmarks would predict. These new MBPs are much much faster. Whether that translates into a real advantage for you depends on what you are doing. For things like video editing, for example, the difference will be readily apparent, and you’ll be able to do things like edit many more simultaneous streams. If you’re running Word or a web browser, the difference will largely be that the machine will be silent (unlike the competition), and have much longer battery life. But the speed won’t do much for you.
I was basing this question on the dick measuring contest over in MR - the "My Intel better than your Arm" thread .

Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).
 
I was basing this question on the dick measuring contest over in MR - the "My Intel better than your Arm" thread .

Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).

Since I’m suspended I haven’t seen what they are writing, but it seems odd. The only AL geekbench numbers I’ve seen are 1287/8950, which are much lower than, say M1 Max (1750/11500). But even if they had higher benchmark scores, i would imagine it would be at a much higher power consumption.
 
I was basing this question on the dick measuring contest over in MR - the "My Intel better than your Arm" thread .

Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).
Curiosity got the better of me so I took a look at that thread. Those benchmark scores barely beat M1 Max, in a chip that won’t be available for months, and which will burn a lot more power than M1 Max. So while it appears to be a nice chip for those stuck in the x86-land, it’s got a long way to go to beat Apple.

In fact, there are some early signs that the power usage may spike to 115W for the duration of these benchmarks, and may even hit peak of 200W. That’s insane. But if all you care about is winning benchmarks, that’d do it.
 
And, by the way, around the time Alder Lake hits the market, so will M2. M2 should match or exceed it in single core performance, at much lower power. Which means that M2 Max will destroy it in multi-core performance.
 
So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?

I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.

Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.

It depends how you benchmark. There are real world uses for matrix multiplication and the conjugate gradient method. If that wasn't the case, things like CUDA would be far less useful.

Another impressive benchmark, which Affinity claims is representative of performance using its apps. Beats $6000 graphics card.


Do keep in mind, those workstation cards typically carry a very high markup. In NVidia's case, the consumer cards were historically crippled on double precision floating point calculations. Maya, Autocade, and the like were historically certified with workstation drivers. If you're merely comparing flops, you could compare against the fastest non-workstation cards. They often have similar flops at a much lower cost, albeit with some limitations (like NVidia and the double precision thing).

And, by the way, around the time Alder Lake hits the market, so will M2. M2 should match or exceed it in single core performance, at much lower power. Which means that M2 Max will destroy it in multi-core performance.

M2 is where I might start looking at going back to a Mac. I don't have a compelling reason to use one these days other than I like the OS. I don't really carry a laptop, but I could go for a Mini or similar, depending on the price of 1TB with 16GB or more of ram (even at 16, I have run out of memory).


Curiosity got the better of me so I took a look at that thread. Those benchmark scores barely beat M1 Max, in a chip that won’t be available for months, and which will burn a lot more power than M1 Max. So while it appears to be a nice chip for those stuck in the x86-land, it’s got a long way to go to beat Apple.

In fact, there are some early signs that the power usage may spike to 115W for the duration of these benchmarks, and may even hit peak of 200W. That’s insane. But if all you care about is winning benchmarks, that’d do it.

No way you can sustain that without a machine that power draw without a machine that sounds like a jet engine. That's also at a point where the difference in electric bill actually becomes noticeable if you use it enough.
 
No way you can sustain that without a machine that power draw without a machine that sounds like a jet engine. That's also at a point where the difference in electric bill actually becomes noticeable if you use it enough.

Exactly my point. You can do it for a short while to win a benchmark, but in practical use it’s a nightmare. Yet some published numbers suggest that’s what is happening.
 
Just been reading some articles that think that Apple screwed the pooch with the Pro & Max CPUs…

Apparently the Apple laptop chips are going to be no match for the yet to be released Intel Desktop chips…

(Wattage be damned)
 
Just been reading some articles that think that Apple screwed the pooch with the Pro & Max CPUs…

Apparently the Apple laptop chips are going to be no match for the yet to be released Intel Desktop chips…

(Wattage be damned)

Meanwhile Apple is laughing its way to the bank and Intel is in a panic.
 
Not sure if it means anything, but geekbench seems to be showing some M1 pro macbooks out in the wild. Multicore scores are proportionally lower than scores for a 12-core Ryzen 9, with single core scores right on top of each other.

Apart from being 64-bit-only, what is Apple doing that so blows Qualcomm, et al, out of the water? Does it have something to do with android?
 
Not sure if it means anything, but geekbench seems to be showing some M1 pro macbooks out in the wild. Multicore scores are proportionally lower than scores for a 12-core Ryzen 9, with single core scores right on top of each other.

Apart from being 64-bit-only, what is Apple doing that so blows Qualcomm, et al, out of the water? Does it have something to do with android?

Apple has much higher memory bandwidth, and, as far as I know, wider issue, deeper reorder buffers, and higher clock speed?
 
How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?
 
How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?

Assuming all else equal, it scales fairly linearly, up until it runs into a bandwidth problem (e.g. it has to stop to clear memory accesses, or to wait for more instructions to be fetched). So, as long as you aren't already saturating buses, it's essentially linear.
 
Exactly my point. You can do it for a short while to win a benchmark, but in practical use it’s a nightmare. Yet some published numbers suggest that’s what is happening.

Yeah I definitely put extra memory, cores, and disk space to good use at times, and I wouldn't buy something running a 200W chip.

How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?

Clock speed determines the maximum frequency at which you can issue instructions on a given port. This is one factor in determining maximum theoretical throughput, noting that super scalar processors may issue instructions on different ports in the same cycle and those supporting simd extensions may execute on multiple data items in parallel. Realized throughput is still limited by memory bandwidth, the maximum number of in flight instructions that a processor can support, and other architecture specific details such as the size of the reorder buffer. It's further limited by imposed constraints such as memory fences.
 
Back
Top