# M1 Pro/Max - additional information



## Cmaier

From: https://www.anandtech.com/show/1702...cialflow&utm_source=twitter&utm_medium=social

Some highlights


GPU running at 1296 MHz (max).  That means it has a very high "IPC", since many cards with comparable performance run at much higher frequency
512-bit wide LPDDR5
48MB SLC cache
Cores: 3.2GHz peak, 128KB L1D (3 cycle load-load latency), 12MB L2 cache
15ns  slower DRAM latency as compared to M1
Single core can saturate up to 102GB/s memory --  2 cores 186 GB/s -- 3 cores 224 GB/s -- 4 cores 243GB/s, which is maximum the CPU cores can stress.  So the CPU cannot, itself, use all 400+GB/s. 
Power usage all over the map.  0.2 W at idle, 34 W CineBench r23 MT, 92 W Aztec High Off + 511.povray_rMT.
In all cases, less than Intel i9-11980HK, often much less, while achieving comparable-to-much-higher performance.

As expected, single thread performance comparable to M1
Multicore: Generally trounces AMD Ryzen 5980HS (35W) and Intel Core i9-11980HK (45 W)
On specfp memory-bound tests (which, I know from experience, is something cpu designers think about), it's performance is "absolutely absurd." 

Conclusion:

"On the CPU side, doubling up on the performance cores is an evident way to increase performance – the competition also does so with some of their designs. How Apple does it differently, is that it not only scaled the CPU cores, but everything surrounding them. It’s not just 4 additional performance cores, it’s a whole new performance cluster with its own L2. On the memory side, Apple has scaled its memory subsystem to never before seen dimensions, and this allows the M1 Pro & Max to achieve performance figures that simply weren’t even considered possible in a laptop chip. The chips here aren’t only able to outclass any competitor laptop design, but also competes against the best desktop systems out there, you’d have to bring out server-class hardware to get ahead of the M1 Max – it’s just generally absurd."


----------



## Cmaier

Meanwhile, Intel’s “M1 killer” needs bursts of 215W consumption.  LOL.









						Intel Alder Lake-P and M Processor Power Limits Listed
					

Reigning in transients




					www.tomshardware.com


----------



## Yoused

What is the performance in crayymps? 2x-4x?


----------



## User.191

So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?

I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.

Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.


----------



## ronntaylor

MissNomer said:


> So a question for our resident brain - @Cmaier: *Do benchmarks really matter over real use?*



Thank you for this question. I see all this technical talk and start to wonder if I'm having a stroke or aneurism.


----------



## Cmaier

MissNomer said:


> So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?
> 
> I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.
> 
> Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.




Well, benchmarks are an indication.  The better the benchmark, the better it correlates to real work.

My wife has an M1 MacBook, and it is, in real use, much faster than my 2016 MBP at everything, just like the benchmarks would predict. These new MBPs are much much faster.  Whether that translates into a real advantage for you depends on what you are doing. For things like video editing, for example, the difference will be readily apparent, and you’ll be able to do things like edit many more simultaneous streams.  If you’re running Word or a web browser, the difference will largely be that the machine will be silent (unlike the competition), and have much longer battery life.  But the speed won’t do much for you.


----------



## User.191

Cmaier said:


> Well, benchmarks are an indication.  The better the benchmark, the better it correlates to real work.
> 
> My wife has an M1 MacBook, and it is, in real use, much faster than my 2016 MBP at everything, just like the benchmarks would predict. These new MBPs are much much faster.  Whether that translates into a real advantage for you depends on what you are doing. For things like video editing, for example, the difference will be readily apparent, and you’ll be able to do things like edit many more simultaneous streams.  If you’re running Word or a web browser, the difference will largely be that the machine will be silent (unlike the competition), and have much longer battery life.  But the speed won’t do much for you.



I was basing this question on the dick measuring contest over in MR -  the "My Intel better than your Arm" thread .

Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).


----------



## Cmaier

MissNomer said:


> I was basing this question on the dick measuring contest over in MR -  the "My Intel better than your Arm" thread .
> 
> Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).




Since I’m suspended I haven’t seen what they are writing, but it seems odd. The only AL geekbench numbers I’ve seen are 1287/8950, which are much lower than, say M1 Max (1750/11500).  But even if they had higher benchmark scores, i would imagine it would be at a much higher power consumption.


----------



## Cmaier

MissNomer said:


> I was basing this question on the dick measuring contest over in MR -  the "My Intel better than your Arm" thread .
> 
> Sounds like while Alder lake may have higher scores does that really matter? Especially when one looks at power and eventual usage (I gather AL is destined for desktop boxes first and not mobile).



Curiosity got the better of me so I took a look at that thread. Those benchmark scores barely beat M1 Max, in a chip that won’t be available for months, and which will burn a lot more power than M1 Max.  So while it appears to be a nice chip for those stuck in the x86-land, it’s got a long way to go to beat Apple.  

In fact, there are some early signs that the power usage may spike to 115W for the duration of these benchmarks, and may even hit peak of 200W.  That’s insane. But if all you care about is winning benchmarks, that’d do it.


----------



## Cmaier

Another impressive benchmark, which Affinity claims is representative of performance using its apps.  Beats $6000 graphics card.









						Apple M1 Max GPU beats $6,000 AMD Radeon Pro W6900X in Affinity benchmark
					

A new benchmark test run with Affinity's tool shows that the M1 Max's GPU beats the $6,000 AMD Radeon Pro W6900X for some tasks.




					9to5mac.com


----------



## Cmaier

And, by the way, around the time Alder Lake hits the market, so will M2. M2 should match or exceed it in single core performance, at much lower power. Which means that M2 Max will destroy it in multi-core performance.


----------



## thekev

MissNomer said:


> So a question for our resident brain - @Cmaier: Do benchmarks really matter over real use?
> 
> I’m so far out the loop on this it’s unreal (time was I geeked out on 8086, 80286 and 80386 chips etc.) but in this day and age, do these various Geekbench benchmarks tell the whole story, as made up of CPU, Bus, memory, SSD’s etc. and how they’re used as a whole.
> 
> Not least do all these benchmarks take all the mitigations to cover Spectre\Meltdown type issues that are still evident and are addressed in the OS.




It depends how you benchmark. There are real world uses for matrix multiplication and the conjugate gradient method. If that wasn't the case, things like CUDA would be far less useful.



Cmaier said:


> Another impressive benchmark, which Affinity claims is representative of performance using its apps.  Beats $6000 graphics card.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Apple M1 Max GPU beats $6,000 AMD Radeon Pro W6900X in Affinity benchmark
> 
> 
> A new benchmark test run with Affinity's tool shows that the M1 Max's GPU beats the $6,000 AMD Radeon Pro W6900X for some tasks.
> 
> 
> 
> 
> 9to5mac.com




Do keep in mind, those workstation cards typically carry a very high markup. In NVidia's case, the consumer cards were historically crippled on double precision floating point calculations. Maya, Autocade, and the like were historically certified with workstation drivers. If you're merely comparing flops, you could compare against the fastest non-workstation cards. They often have similar flops at a much lower cost, albeit with some limitations (like NVidia and the double precision thing).



Cmaier said:


> And, by the way, around the time Alder Lake hits the market, so will M2. M2 should match or exceed it in single core performance, at much lower power. Which means that M2 Max will destroy it in multi-core performance.




M2 is where I might start looking at going back to a Mac. I don't have a compelling reason to use one these days other than I like the OS. I don't really carry a laptop, but I could go for a Mini or similar, depending on the price of 1TB with 16GB or more of ram (even at 16, I have run out of memory).




Cmaier said:


> Cmaier said:
> 
> 
> 
> Curiosity got the better of me so I took a look at that thread. Those benchmark scores barely beat M1 Max, in a chip that won’t be available for months, and which will burn a lot more power than M1 Max.  So while it appears to be a nice chip for those stuck in the x86-land, it’s got a long way to go to beat Apple.
> 
> In fact, there are some early signs that the power usage may spike to 115W for the duration of these benchmarks, and may even hit peak of 200W.  That’s insane. But if all you care about is winning benchmarks, that’d do it.
Click to expand...



No way you can sustain that without a machine that power draw without a machine that sounds like a jet engine. That's also at a point where the difference in electric bill actually becomes noticeable if you use it enough.


----------



## Cmaier

thekev said:


> No way you can sustain that without a machine that power draw without a machine that sounds like a jet engine. That's also at a point where the difference in electric bill actually becomes noticeable if you use it enough.




Exactly my point. You can do it for a short while to win a benchmark, but in practical use it’s a nightmare. Yet some published numbers suggest that’s what is happening.


----------



## User.191

Just been reading some articles that think that Apple screwed the pooch with the Pro & Max CPUs…

Apparently the Apple laptop chips are going to be no match for the yet to be released Intel Desktop chips…

(Wattage be damned)


----------



## Cmaier

MissNomer said:


> Just been reading some articles that think that Apple screwed the pooch with the Pro & Max CPUs…
> 
> Apparently the Apple laptop chips are going to be no match for the yet to be released Intel Desktop chips…
> 
> (Wattage be damned)




Meanwhile Apple is laughing its way to the bank and Intel is in a panic.


----------



## Yoused

Not sure if it means anything, but geekbench seems to be showing some M1 pro macbooks out in the wild. Multicore scores are proportionally lower than scores for a 12-core Ryzen 9, with single core scores right on top of each other.

Apart from being 64-bit-only, what is Apple doing that so blows Qualcomm, et al, out of the water? Does it have something to do with android?


----------



## Cmaier

Yoused said:


> Not sure if it means anything, but geekbench seems to be showing some M1 pro macbooks out in the wild. Multicore scores are proportionally lower than scores for a 12-core Ryzen 9, with single core scores right on top of each other.
> 
> Apart from being 64-bit-only, what is Apple doing that so blows Qualcomm, et al, out of the water? Does it have something to do with android?




Apple has much higher memory bandwidth, and, as far as I know, wider issue, deeper reorder buffers, and higher clock speed?


----------



## Yoused

How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?


----------



## Cmaier

Yoused said:


> How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?




Assuming all else equal, it scales fairly linearly, up until it runs into a bandwidth problem (e.g. it has to stop to clear memory accesses, or to wait for more instructions to be fetched).  So, as long as you aren't already saturating buses, it's essentially linear.


----------



## thekev

Cmaier said:


> Exactly my point. You can do it for a short while to win a benchmark, but in practical use it’s a nightmare. Yet some published numbers suggest that’s what is happening.




Yeah I definitely put extra memory, cores, and disk space to good use at times, and I wouldn't buy something running a 200W chip. 



Yoused said:


> How does clock speed scale? If a cpu scores x on gb at φ, how close would it be likely to get to 2x at 2φ (assuming it could run at that speed)?




Clock speed determines the maximum frequency at which you can issue instructions on a given port. This is one factor in determining maximum theoretical throughput, noting that super scalar processors may issue instructions on different ports in the same cycle and those supporting simd extensions may execute on multiple data items in parallel. Realized throughput is still limited by memory bandwidth, the maximum number of in flight instructions that a processor can support, and other architecture specific details such as the size of the reorder buffer. It's further limited by imposed constraints such as memory fences.


----------



## SuperMatt

Farhad Manjoo wrote a piece for the NY Times about the new chips (paywall removed):









						Opinion | The Chip That Could Transform Computing (Published 2021)
					

Apple’s custom processors suggest that computers are nowhere near hitting their performance limits.




					www.nytimes.com


----------



## Yoused

One has to wonder if the blistering performance of M1 is in part due to better code design by Apple. Obviously the architecture has a feature or two within it that works in favor of macOS/iOS (probably something that erases Cocoa object overhead), and the software teams have a closer connection with the hardware teams, but Windows is still such a mess because MS has the genius boneheads who do everything the hard way.


----------



## Cmaier

Yoused said:


> One has to wonder if the blistering performance of M1 is in part due to better code design by Apple. Obviously the architecture has a feature or two within it that works in favor of macOS/iOS (probably something that erases Cocoa object overhead), and the software teams have a closer connection with the hardware teams, but Windows is still such a mess because MS has the genius boneheads who do everything the hard way.




Apple certainly ekes out some that way, but run Windows on M1 and it will still destroy everything else running windows in performance/watt


----------



## Andropov

Yoused said:


> One has to wonder if the blistering performance of M1 is in part due to better code design by Apple. Obviously the architecture has a feature or two within it that works in favor of macOS/iOS (probably something that erases Cocoa object overhead), and the software teams have a closer connection with the hardware teams, but Windows is still such a mess because MS has the genius boneheads who do everything the hard way.



I'd add proper thread and process management to the list of things that Apple does well (and Microsoft/Intel doesn't). This is something that is not captured by most benchmarks (as they are run at full power on all threads, which shouldn't be too difficult to schedule), but it's noticeable in normal use.

In fact, after a couple weeks now using the M1 Pro MBP, one of the things that surprises me the most about the new chips is how you can be running something CPU-intensive in the background (a numerical simulation, for instance), using all cores, and still have a perfectly smooth UI without a single dropped frame. This isn't simply a consequence of the CPU being very fast (my previous MBP was a 2019 16" i9, which is already plenty fast), there's something else going on. I don't know how they do it, but it'd be hard to overstate how much faster this makes the computer *feel*.


----------



## Nycturne

Andropov said:


> I'd add proper thread and process management to the list of things that Apple does well (and Microsoft/Intel doesn't). This is something that is not captured by most benchmarks (as they are run at full power on all threads, which shouldn't be too difficult to schedule), but it's noticeable in normal use.
> 
> In fact, after a couple weeks now using the M1 Pro MBP, one of the things that surprises me the most about the new chips is how you can be running something CPU-intensive in the background (a numerical simulation, for instance), using all cores, and still have a perfectly smooth UI without a single dropped frame. This isn't simply a consequence of the CPU being very fast (my previous MBP was a 2019 16" i9, which is already plenty fast), there's something else going on. I don't know how they do it, but it'd be hard to overstate how much faster this makes the computer *feel*.



Apple’s QoS system is well suited for this particular goal. Add in GCD which makes thread pools easier to manage than anything I did in .NET land ages ago (keep in mind it was in the .NET 4 era) and it’s not hard for a developer to mark their work appropriately for the system. The new Swift concurrency system seems to be taking this system and making it even more performant by addressing weaknesses in current thread pool designs, while also taking notes on what worked well.

Part of it though is that the highest priority is “User Interactive”. This is the priority that event handling is done at, and a priority level a developer would pretty much never set on work themselves. It means that the system is always able to prioritize the UI threads for responsiveness in apps at the expense of everything else. Does this mean your simulation might be delayed? Yes. But does the user care about that if they can do something else with the system? Generally no.


----------



## Andropov

Nycturne said:


> Apple’s QoS system is well suited for this particular goal. Add in GCD which makes thread pools easier to manage than anything I did in .NET land ages ago (keep in mind it was in the .NET 4 era) and it’s not hard for a developer to mark their work appropriately for the system.



Yes! It's almost unbelievable that Apple shipped GCD in 2009. It's made the transition to heterogeneous cores much simpler than on other platforms.



Nycturne said:


> The new Swift concurrency system seems to be taking this system and making it even more performant by addressing weaknesses in current thread pool designs, while also taking notes on what worked well.



I haven't looked that much into Swift concurrency yet, as it's been iOS 15-only (until very recently) and at work we are required to support at least two major iOS versions, so I don't know the impact it has in multithread performance. My impression from what I saw at the WWDC was that it made multithreaded code much easier to write. This may compel software companies to start refactoring monolithic single-threaded code to offload some work to other threads. This was already technically possible, but by making it much easier to write, it may cause developers to actually do it, which I think was the ultimate goal of the whole concurrency feature.



Nycturne said:


> Part of it though is that the highest priority is “User Interactive”. This is the priority that event handling is done at, and a priority level a developer would pretty much never set on work themselves.



I wonder how big of an effect the naming scheme of Apple's QoS had in this. While I haven't really written many multiplatform simulations, I believe the alternatives would be using OpenMP (where you can't set a priority as far as I can tell) or POSIX threads. The API to get the min/max values of pthread priorities seems to rely on calling _sched_get_priority_max_(), so I think it'd be far more common to carelessly set a thread priority to max using the value returned by that function than setting a GCD queue priority to _.userInteractive_, since the semantics are much more clear. You wouldn't explicitly set a queue priority to _.userInteractive_ for anything that wasn't actually user-interactive, it just looks wrong. But setting a pthread priority to the maximum value doesn't 'look' wrong.


----------



## mr_roboto

Andropov said:


> I wonder how big of an effect the naming scheme of Apple's QoS had in this. While I haven't really written many multiplatform simulations, I believe the alternatives would be using OpenMP (where you can't set a priority as far as I can tell) or POSIX threads. The API to get the min/max values of pthread priorities seems to rely on calling _sched_get_priority_max_(), so I think it'd be far more common to carelessly set a thread priority to max using the value returned by that function than setting a GCD queue priority to _.userInteractive_, since the semantics are much more clear. You wouldn't explicitly set a queue priority to _.userInteractive_ for anything that wasn't actually user-interactive, it just looks wrong. But setting a pthread priority to the maximum value doesn't 'look' wrong.



This is actually a great point.  Names in interfaces are important!  Heck, names in code are important.  Whenever I write some RTL, I try to do a variable name revision pass right after I first get it working.  This isn't just about making the code clear for future maintainers, it also gets present-me to rethink stuff I just did.  The act of changing names to be more meaningful often leads me to bugfixes and even substantial optimizations.


----------



## Nycturne

Andropov said:


> I haven't looked that much into Swift concurrency yet, as it's been iOS 15-only (until very recently) and at work we are required to support at least two major iOS versions, so I don't know the impact it has in multithread performance. My impression from what I saw at the WWDC was that it made multithreaded code much easier to write. This may compel software companies to start refactoring monolithic single-threaded code to offload some work to other threads. This was already technically possible, but by making it much easier to write, it may cause developers to actually do it, which I think was the ultimate goal of the whole concurrency feature.




In some ways it makes things harder in existing code, because of the GCD-isms present in existing APIs. I'm in the middle of trying to convert some CoreData sync code to use concurrency as a learning exercise, and you're forced to operate in both worlds when using background contexts (which you still want to use for performance reasons). So you've got your async/await code that handles networking that's still dispatching into a GCD queue to perform the updates. Yay? And Actors don't really help in this specific case. 

But the real performance boost comes from the fact that tasks under Swift concurrency are not associated with specific threads or queues. Allowing cooperative task management on a given thread opens opportunities for switching work without a heavy-weight context switch to a new thread, or even doing work synchronously on the same thread when possible. That and being able to prevent thread explosions, which GCD is still vulnerable to. 



Andropov said:


> I wonder how big of an effect the naming scheme of Apple's QoS had in this. While I haven't really written many multiplatform simulations, I believe the alternatives would be using OpenMP (where you can't set a priority as far as I can tell) or POSIX threads. The API to get the min/max values of pthread priorities seems to rely on calling _sched_get_priority_max_(), so I think it'd be far more common to carelessly set a thread priority to max using the value returned by that function than setting a GCD queue priority to _.userInteractive_, since the semantics are much more clear. You wouldn't explicitly set a queue priority to _.userInteractive_ for anything that wasn't actually user-interactive, it just looks wrong. But setting a pthread priority to the maximum value doesn't 'look' wrong.




I absolutely think giving clear semantics helps. It still uses something similar to pthread priorities under the hood, but an API that helps developers understand how to use it properly is going to get used properly a lot more often. Apple does this a lot though, sometimes to the point of getting in the way. I'm reminded of how UITextInput works.


----------



## Andropov

Nycturne said:


> In some ways it makes things harder in existing code, because of the GCD-isms present in existing APIs. I'm in the middle of trying to convert some CoreData sync code to use concurrency as a learning exercise, and you're forced to operate in both worlds when using background contexts (which you still want to use for performance reasons). So you've got your async/await code that handles networking that's still dispatching into a GCD queue to perform the updates. Yay? And Actors don't really help in this specific case.



Interesting. I have a side project where I'm learning to use async/await, but I'm only using it for the network calls, I still plan to manage everything else using GCD queues. So far it hasn't made the code as good looking as in the WWDC sample videos, I haven't found a way around wrapping a lot of it in Task {...} and DispatchQueue.main.async{...} blocks. Where I think it would immensely simplify the code is in the project I develop at work (iOS app for a major fashion brand). The whole project is littered with completion handlers everywhere and sometimes it's very difficult to follow the control flow of the app. Async/await would make everything much more readable.



Nycturne said:


> Allowing cooperative task management on a given thread opens opportunities for switching work without a heavy-weight context switch to a new thread, or even doing work synchronously on the same thread when possible.



That's a very good point. Hadn't thought of it that way.


----------



## Andropov

By the way, since we're in the M1 Pro/M1 Max thread, any theories of why Apple chose to go with only two efficiency cores? This is my CPU History right now, just replying at this thread in Safari:




Looks like it would benefit of having at least the 4 efficiency cores of the M1, right? It's always running the two efficiency cores at max utilization. And looking at the floorplan of the M1 Pro/Max it's not like the efficiency cores take a huge amount of die space, so I'm puzzled as of why they chose to remove two of the cores. They even have the 4MB L2 cache of the M1, but shared between two cores instead of four.


----------



## User.45

Andropov said:


> By the way, since we're in the M1 Pro/M1 Max thread, any theories of why Apple chose to go with only two efficiency cores? This is my CPU History right now, just replying at this thread in Safari:
> View attachment 9728
> 
> Looks like it would benefit of having at least the 4 efficiency cores of the M1, right? It's always running the two efficiency cores at max utilization. And looking at the floorplan of the M1 Pro/Max it's not like the efficiency cores take a huge amount of die space, so I'm puzzled as of why they chose to remove two of the cores. They even have the 4MB L2 cache of the M1, but shared between two cores instead of four.



Noob question, but could it be that their efficiency is maximized at their 100% load? So they are designed to run 100% all the time for the ubiquitous tasks that keep the system “alive”?


----------



## Cmaier

Andropov said:


> By the way, since we're in the M1 Pro/M1 Max thread, any theories of why Apple chose to go with only two efficiency cores? This is my CPU History right now, just replying at this thread in Safari:
> View attachment 9728
> 
> Looks like it would benefit of having at least the 4 efficiency cores of the M1, right? It's always running the two efficiency cores at max utilization. And looking at the floorplan of the M1 Pro/Max it's not like the efficiency cores take a huge amount of die space, so I'm puzzled as of why they chose to remove two of the cores. They even have the 4MB L2 cache of the M1, but shared between two cores instead of four.




It does seem a little puzzling, but I assume that they profiled a lot of instruction traces and determined that 2 got you most of the benefit of 4, and that the spillover to the P-cores was minimal.


----------



## Yoused

I am not sure that those gauges are really indicative, though. I had a background thread that had a work queue and a lock for other threads to add to the queue. When I used a spinlock, my CPU meter was pegged all the time. Then I switched the code to use a condition lock, meaning the thread was frozen until something was put into the queue and the meter bottomed out.

A spinlock does basically nothing, so the meter was pegged on nothing really happening. All those gauges tell you is that code is constantly running, not that it is doing much. There is always some housekeeping to be done, but when you have real work to do, call in the big guns.


----------



## Andropov

Cmaier said:


> It does seem a little puzzling, but I assume that they profiled a lot of instruction traces and determined that 2 got you most of the benefit of 4, and that the spillover to the P-cores was minimal.



Hm. _If_ there's spillover to the P-cores. 

This got me thinking, as it immediately raises another question: why does the M1 have 4 efficiency cores then? A plausible (but untested) explanation would be that maybe the M1 Pro/Max schedules threads more aggressively to the P-cores, sending tasks with a lower (but not the lowest) QoS to the P-cores by default, where the M1 would have sent them to the E-cores (to maximize battery life instead of performance), and that's why the M1 has two additional E-cores.

But then, why not have four E-cores anyway? Surely any professional has more background tasks running than the average M1 user. Maybe Apple saw that as a problem, as there's often quite a few useless utilities wasting battery in the background (CCloud, I'm looking at you). It would be interesting to know if the lowest QoS tasks can spill to the P-cores on the M1 Pro/Max. On the regular M1 they were locked to the E-cores. This effectively sets a cap on how much .background QoS tasks can consume.

If they're still locked to the E-cores, maybe Apple just set a maximum power budget for purely background tasks with the lowest QoS (to avoid discharging the battery too fast just idling), decided that two E-cores are enough for that, and that there's no need to have another two for middle-to-low QoS since they're going to get scheduled to the P-cores anyway.


----------



## mr_roboto

Andropov said:


> Looks like it would benefit of having at least the 4 efficiency cores of the M1, right? It's always running the two efficiency cores at max utilization.



You have to be careful about measuring this - under low loads I've seen CPU cores reported at high utilization just because Apple's power control loops aren't detecting enough demand to ramp frequencies up all the way.


----------



## Pumbaa

Cmaier said:


> It does seem a little puzzling, but I assume that they profiled a lot of instruction traces and determined that 2 got you most of the benefit of 4, and that the spillover to the P-cores was minimal.



This inspired me to enable the CPU History window on my M1 Mini. So far, it has been rare for the four E-cores to be utilized more than 50%.


----------



## Hrafn

Pumbaa said:


> This inspired me to enable the CPU History window on my M1 Mini. So far, it has been rare for the four E-cores to be utilized more than 50%.



Me too, but with only one screen, I needed to set it so it's not always on top.  I'm mostly just using the efficiency cores.


----------



## mr_roboto

I've also wondered why they chose to drop to two E cores.  It seems like there's often enough work for them to do, they're so profoundly efficient compared to P cores, and they're so tiny. 

The topic comes up at about 24:25 in this interview with Tim Millet and Tom Boger (Apple VPs), and they mention something interesting:









						Upgrade #379: They Feed on Memory Bandwidth - Relay FM
					

We're joined by Apple VPs Tom Boger and Tim Millet to discuss Apple's chip-design philosophy and how it factored into the company's first high-end Mac chips, the M1 Pro and M1 Max.




					www.relay.fm
				




According to Tim, the top end of the E core's perf/Watt curve (max performance and power) has some overlap with the bottom end of the P core's curve, so spilling some "E" type tasks to P cores isn't so bad.  (unstated: as long as the P core stays at the bottom end of its clock range!)

I don't think this is the whole story.  Everyone Apple sends out to do post-launch interviews has clearly been put through a lot of interview prep; they're very slick about funneling the conversation towards positive things which promote the product while avoiding saying anything in negative terms.  But this does help explain why E cores ended up on the chopping block.  Perhaps there was a desperate need to reclaim a bit of area because some other block was over budget and they didn't want to grow the total die size.  It's simple triage at that point: find something which users won't miss a lot, and remove it.

I'd still rather have four E cores and a (very) slightly larger die.  They're cool!  Literally and figuratively.


----------



## Cmaier

mr_roboto said:


> I've also wondered why they chose to drop to two E cores.  It seems like there's often enough work for them to do, they're so profoundly efficient compared to P cores, and they're so tiny.
> 
> The topic comes up at about 24:25 in this interview with Tim Millet and Tom Boger (Apple VPs), and they mention something interesting:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Upgrade #379: They Feed on Memory Bandwidth - Relay FM
> 
> 
> We're joined by Apple VPs Tom Boger and Tim Millet to discuss Apple's chip-design philosophy and how it factored into the company's first high-end Mac chips, the M1 Pro and M1 Max.
> 
> 
> 
> 
> www.relay.fm
> 
> 
> 
> 
> 
> According to Tim, the top end of the E core's perf/Watt curve (max performance and power) has some overlap with the bottom end of the P core's curve, so spilling some "E" type tasks to P cores isn't so bad.  (unstated: as long as the P core stays at the bottom end of its clock range!)
> 
> I don't think this is the whole story.  Everyone Apple sends out to do post-launch interviews has clearly been put through a lot of interview prep; they're very slick about funneling the conversation towards positive things which promote the product while avoiding saying anything in negative terms.  But this does help explain why E cores ended up on the chopping block.  Perhaps there was a desperate need to reclaim a bit of area because some other block was over budget and they didn't want to grow the total die size.  It's simple triage at that point: find something which users won't miss a lot, and remove it.
> 
> I'd still rather have four E cores and a (very) slightly larger die.  They're cool!  Literally and figuratively.



I don’t think it’s an area issue. Those things are tiny. About the same size as the extra neural stuff that isn’t even used.


----------



## mr_roboto

Maybe 1/4 the area of one neural cluster, in fact, assuming the person who annotated Anandtech's die photo got things correct. They didn't even save half the area of M1's E cluster: the shared L2 is still full size, and the L2 looks like it's at least a third of the 4-core E cluster area.

The M1 Pro layout is also the top part of M1 Max, and it looks densely packed, though it's hard to tell for sure in die photos.  The extra neural engine cluster is only present in the M1 Max.


----------



## Cmaier

mr_roboto said:


> Maybe 1/4 the area of one neural cluster, in fact, assuming the person who annotated Anandtech's die photo got things correct. They didn't even save half the area of M1's E cluster: the shared L2 is still full size, and the L2 looks like it's at least a third of the 4-core E cluster area.
> 
> The M1 Pro layout is also the top part of M1 Max, and it looks densely packed, though it's hard to tell for sure in die photos.  The extra neural engine cluster is only present in the M1 Max.




Right. So whatever their reason to not add more E’s, it wasn’t space.  Maybe had something to do with bandwidth.


----------



## Nycturne

Andropov said:


> Interesting. I have a side project where I'm learning to use async/await, but I'm only using it for the network calls, I still plan to manage everything else using GCD queues. So far it hasn't made the code as good looking as in the WWDC sample videos, I haven't found a way around wrapping a lot of it in Task {...} and DispatchQueue.main.async{...} blocks. Where I think it would immensely simplify the code is in the project I develop at work (iOS app for a major fashion brand). The whole project is littered with completion handlers everywhere and sometimes it's very difficult to follow the control flow of the app. Async/await would make everything much more readable.




Concurrency is really meant to replace GCD queues. They solve the same problem in different ways, and so trying to use both just makes it harder.



Andropov said:


> That's a very good point. Hadn't thought of it that way.




Interesting how things come full circle. Preemptive multitasking was seen as hands down better than cooperative multitasking, and now async/await is bringing it back (not just for Swift either).



Cmaier said:


> I don’t think it’s an area issue. Those things are tiny. About the same size as the extra neural stuff that isn’t even used.




Considering Apple's approach is to use QoS to manage which cores work gets put on (rather than "light loads"), I suspect there are some factors at play as to how the chips are used by the OS. M1 is a shared chip that runs macOS and iOS, but how the QoS levels play out are different between the two. iOS with only one active app at a time will have fewer user-initiated threads (or higher) to deal with, and anything in the background is relegated to a low QoS, and a lot of it is managing I/O. macOS on the other hand can't push priority down on background apps as aggressively, since it could be running work the user asked for that's still time sensitive. 

I'm left wondering if it's because macOS doesn't leverage the E cores to the same extent iOS does, and the M1 has 4 E cores for the sake of the iPad, not macOS. If background work runs a little longer, that's fine. But on the M1 Macs, the extra E cores still provide a little better multicore support than not having them.


----------



## Cmaier

Nycturne said:


> Concurrency is really meant to replace GCD queues. They solve the same problem in different ways, and so trying to use both just makes it harder.
> 
> 
> 
> Interesting how things come full circle. Preemptive multitasking was seen as hands down better than cooperative multitasking, and now async/await is bringing it back (not just for Swift either).
> 
> 
> 
> Considering Apple's approach is to use QoS to manage which cores work gets put on (rather than "light loads"), I suspect there are some factors at play as to how the chips are used by the OS. M1 is a shared chip that runs macOS and iOS, but how the QoS levels play out are different between the two. iOS with only one active app at a time will have fewer user-initiated threads (or higher) to deal with, and anything in the background is relegated to a low QoS, and a lot of it is managing I/O. macOS on the other hand can't push priority down on background apps as aggressively, since it could be running work the user asked for that's still time sensitive.
> 
> I'm left wondering if it's because macOS doesn't leverage the E cores to the same extent iOS does, and the M1 has 4 E cores for the sake of the iPad, not macOS. If background work runs a little longer, that's fine. But on the M1 Macs, the extra E cores still provide a little better multicore support than not having them.




That’s a possibility, of course, but Apple does control macos and there are plenty of threads that the OS runs that could run on the e-cores.  I‘m guessing they just simulated various possibilities and found that it made little difference.


----------



## Andropov

More info about the disabled 2nd Neural Engine on the M1 Max: 

https://www.twitter.com/i/web/status/1460926428254326786/

I wonder what could have happened. It must be very close to being functional.


----------



## Cmaier

Andropov said:


> More info about the disabled 2nd Neural Engine on the M1 Max:
> 
> https://www.twitter.com/i/web/status/1460926428254326786/
> 
> I wonder what could have happened. It must be very close to being functional.



Weird. If it isn’t fused off, seems to me that apple intends to enable it at some point.


----------



## SuperMatt

Andropov said:


> More info about the disabled 2nd Neural Engine on the M1 Max:
> 
> https://www.twitter.com/i/web/status/1460926428254326786/
> 
> I wonder what could have happened. It must be very close to being functional.



Ready to activate when Skynet becomes active…


----------



## Pumbaa

Cmaier said:


> Weird. If it isn’t fused off, seems to me that apple intends to enable it at some point.



This, for some reason, made me think of when they wanted to charge for the 802.11n enabler.


----------



## Andropov

Cmaier said:


> Weird. If it isn’t fused off, seems to me that apple intends to enable it at some point.



Noob question, but what are the advantages of fusing off the block vs power gating it? (Which I assume is how they're doing it). Is it just done to completely avoid power leaking?


----------



## Cmaier

Andropov said:


> Noob question, but what are the advantages of fusing off the block vs power gating it? (Which I assume is how they're doing it). Is it just done to completely avoid power leaking?




Yep. Fusing also prevents someone from hacking the device and turning it back on.


----------



## aeronatis

Cmaier said:


> Weird. If it isn’t fused off, seems to me that apple intends to enable it at some point.




I realised, when using Pixelmator Pro, my M1 Max MacBook Pro completed ML Super Resolution in the exact same time as my old M1 MacBook Air, which suggests the app efficiently uses the Neural Engine cores for the task. If it is possible that the extra neural engine cores will be activated in a future software update, that could significantly make many tasks much quicker. Not holding my breath though.


----------



## throAU

mr_roboto said:


> According to Tim, the top end of the E core's perf/Watt curve (max performance and power) has some overlap with the bottom end of the P core's curve, so spilling some "E" type tasks to P cores isn't so bad. (unstated: as long as the P core stays at the bottom end of its clock range!)
> 
> I don't think this is the whole story. Everyone Apple sends out to do post-launch interviews has clearly been put through a lot of interview prep; they're very slick about funneling the conversation towards positive things which promote the product while avoiding saying anything in negative terms. But this does help explain why E cores ended up on the chopping block. Perhaps there was a desperate need to reclaim a bit of area because some other block was over budget and they didn't want to grow the total die size. It's simple triage at that point: find something which users won't miss a lot, and remove it.
> 
> I'd still rather have four E cores and a (very) slightly larger die. They're cool! Literally and figuratively.




Could possibly be purely due to targeting of the MBA and 13" MBP previously and/or apple being less confident about yields on the brand new (at that time) 5nm process.

i.e., the new machines didn't get only 2 E cores because they had some cut off... maybe the 4 E cores on the earlier machines was a hedge... and now apple are more confident.

4E + 4P on the original M1 could have been a hedge against not getting great yields on the larger P cores (and we could have maybe had 6 core M1s with 4x E cores and 2x P cores, or 3 of each or whatever depending on how they yielded).

Could've been that 4E + 4P = the E cores are used for background tasks, but also sort of cap the power consumption on those machines whilst providing better performance than the P cores alone.

Maybe as they got better yield than expected, got more confidence with manufacturing or simply decided that the performance is more of a priority than pure battery life on the new pro machines, they went for a different mix within a given die size (i.e. the choice of 4 E cores originally was more of a target against iPad + small MacBook, more than the 2 E cores on bigger machines was taking things away from those for space reasons).


----------



## Cmaier

throAU said:


> Could possibly be purely due to targeting of the MBA and 13" MBP previously and/or apple being less confident about yields on the brand new (at that time) 5nm process.
> 
> i.e., the new machines didn't get only 2 E cores because they had some cut off... maybe the 4 E cores on the earlier machines was a hedge... and now apple are more confident.
> 
> 4E + 4P on the original M1 could have been a hedge against not getting great yields on the larger P cores (and we could have maybe had 6 core M1s with 4x E cores and 2x P cores, or 3 of each or whatever depending on how they yielded).
> 
> Could've been that 4E + 4P = the E cores are used for background tasks, but also sort of cap the power consumption on those machines whilst providing better performance than the P cores alone.
> 
> Maybe as they got better yield than expected, got more confidence with manufacturing or simply deciding that the performance is more of a priority than pure battery life on the new pro machines, they went for a different mix within a given die size (i.e. the choice of 4 E cores originally was more of a target against iPad + small MacBook, more than the 2 E cores on bigger machines was taking things away from those for space reasons).



Hey! Good to see you here.


----------



## throAU

Cmaier said:


> Hey! Good to see you here.




Hey, blame <name has been withheld to protect the guilty> inviting me 



But yeah, knowing the shrewd way apple is run with regards to logistics and things of that nature, it wouldn't surprise me if the E core thing was driven as much or more by production risk, etc. than outright software performance reasons.


----------



## mr_roboto

throAU said:


> Could possibly be purely due to targeting of the MBA and 13" MBP previously and/or apple being less confident about yields on the brand new (at that time) 5nm process.
> 
> i.e., the new machines didn't get only 2 E cores because they had some cut off... maybe the 4 E cores on the earlier machines was a hedge... and now apple are more confident.
> 
> 4E + 4P on the original M1 could have been a hedge against not getting great yields on the larger P cores (and we could have maybe had 6 core M1s with 4x E cores and 2x P cores, or 3 of each or whatever depending on how they yielded).
> 
> Could've been that 4E + 4P = the E cores are used for background tasks, but also sort of cap the power consumption on those machines whilst providing better performance than the P cores alone.



I don't think these ideas make sense TBH.  Remember that M1 is what Apple would've called A14X if they hadn't chosen to start the transition by fall 2020. A12 was 2P+4E and A12X was 4P+4E, same as the relationship between A14 and A14X aka M1.  This wasn't an experimental config, it was one they knew well.

I expected them to stick with the 4-core E cluster in what we now know as M1 Pro and Max just because that's the path of least resistance and four E cores are pretty useful.  So it's interesting that they removed them, and that it's not just harvesting die with defective CPUs - every M1 Pro or Max chip has to yield two of two E cores, because there are only two.


----------



## mr_roboto

Also: welcome aboard!  Much less trollin' around these parts.


----------



## throAU

mr_roboto said:


> This wasn't an experimental config, it was one they knew well.




Yeah the configuration was not experimental, but previous variants were on 7nm or larger.

A14/M1 is on *5nm* (first parts on that process if I'm not mistaken - and Apple booked basically all of TSMC's capacity of it for now? i.e., they bet big) and perhaps Apple were less confident or more risk averse with yield.  i.e., the M1 was designed as a 4+4 in the hope of getting at least a 3E+3P configuration or 4E+3P configuration out of it in volume.  Too much P core emphasis may have resulted in higher defect rate due to the larger P cores dependence.

Maybe when they got good yields on the P cores and everything else with 5nm, they figured they could be less conservative and go with a more P core heavy configuration without risk of them not yielding.

Could be other logistic/yield reasons for the M1 being the way it was (more E core heavy) too.  Point being that maybe it wasn't purely based on performance and more business reason/risk related.



Could also purely be that they were for low power notebooks and they deemed the E cores more likely to handle 90% of the stuff people do on those machines.  As it is with my 14", the E cores do 90% of the work the majority of the time I'm using the machine.  Unless I'm running something compute heavy on it the P cores, especially the last 4 are mostly idle (see attached) - even with only two of them.


----------



## B01L

Cmaier said:


> In fact, there are some early signs that the power usage may spike to 115W for the duration of these benchmarks, and may even hit peak of 200W.  That’s insane. But if all you care about is winning benchmarks, that’d do it.




We have a friendly sort of competition over on the Small Form Factor Network forums; Performance Per Liter, aka PPL, which is on Round Four right now...

One of the changes that might be made to the "competition" are longer durations of the benchmarks, which could introduce heat soak into the SFF systems, causing thermal throttling...?

You might win the quarter mile drag race, but can you finish the 24 Hours of LeMans...?!? ;^p



thekev said:


> M2 is where I might start looking at going back to a Mac. I don't have a compelling reason to use one these days other than I like the OS. I don't really carry a laptop, but I could go for a Mini or similar, depending on the price of 1TB with 16GB or more of ram (even at 16, I have run out of memory).




I really want to get a M1 Max-powered Mac mini, but I also can see the sense of waiting for a second (or even third) gen product; shake the bugs out of the hardware & (by then) a more robust offering of ASi native / Metal optimized software packages...?


----------



## Cmaier

B01L said:


> We have a friendly sort of competition over on the Small Form Factor Network forums; Performance Per Liter, aka PPL, which is on Round Four right now...
> 
> One of the changes that might be made to the "competition" are longer durations of the benchmarks, which could introduce heat soak into the SFF systems, causing thermal throttling...?
> 
> You might win the quarter mile drag race, but can you finish the 24 Hours of LeMans...?!? ;^p
> 
> 
> 
> I really want to get a M1 Max-powered Mac mini, but I also can see the sense of waiting for a second (or even third) gen product; shake the bugs out of the hardware & (by then) a more robust offering of ASi native / Metal optimized software packages...?



Welcome!

As the CTO at AMD explained to me one time, burning 200W is, regardless of performance, bad, at least if you want to sell a lot of processors.  Server farms, server rooms - indeed, any building - only have so much incoming electrical capacity and cooling capacity. Our customers didn’t want to have to spend millions of dollars retrofitting or building new facilities in order to get more computing capacity.


----------



## Nycturne

throAU said:


> Maybe when they got good yields on the P cores and everything else with 5nm, they figured they could be less conservative and go with a more P core heavy configuration without risk of them not yielding.
> 
> Could be other logistic/yield reasons for the M1 being the way it was (more E core heavy) too.  Point being that maybe it wasn't purely based on performance and more business reason/risk related.



Keep in mind M1 is _also_ an iOS chip, which pushes background user apps down in priority, while macOS doesn’t. So on iOS, having more E cores means you can allow background work to continue without impacting the user experience due to the limited background API functionality iOS has and the ability to set the QoS on those things to background with high confidence that it is the right QoS. MacOS doesn’t follow that pattern, and apps that are active, but not in the foreground can run work at higher QoS levels that would get assigned to the P cores, causing more contention for those cores.

That said, since the E cores can be used for work assigned to the P cores when the P cores are full, then there’s no real harm in having a couple more than you _need_. And it allows the M1 to squeeze more work through and have better latencies under load than without them.

Now, when you have “enough” P cores, you aren’t going to spill as many threads onto the E cores. The M1 Pro/Max, with 2 performance core clusters, will favor the first core cluster, then the second, then the E cores when it comes to work assigned to P cores. 

The specific measurements are something only Apple holds, but I suspect that Apple saw that by trying to keep the second cluster idle unless there’s work to do, that second cluster handles _most_ of the spill over coming from the first cluster, and that the E cluster isn’t needed as much for spillover, and so it can be dedicated more towards handling the low priority work where latency doesn’t matter.


----------



## Cmaier

Nycturne said:


> Keep in mind M1 is _also_ an iOS chip, which pushes background user apps down in priority, while macOS doesn’t. So on iOS, having more E cores means you can allow background work to continue without impacting the user experience due to the limited background API functionality iOS has and the ability to set the QoS on those things to background with high confidence that it is the right QoS. MacOS doesn’t follow that pattern, and apps that are active, but not in the foreground can run work at higher QoS levels that would get assigned to the P cores, causing more contention for those cores.
> 
> That said, since the E cores can be used for work assigned to the P cores when the P cores are full, then there’s no real harm in having a couple more than you _need_. And it allows the M1 to squeeze more work through and have better latencies under load than without them.
> 
> Now, when you have “enough” P cores, you aren’t going to spill as many threads onto the E cores. The M1 Pro/Max, with 2 performance core clusters, will favor the first core cluster, then the second, then the E cores when it comes to work assigned to P cores.
> 
> The specific measurements are something only Apple holds, but I suspect that Apple saw that by trying to keep the second cluster idle unless there’s work to do, that second cluster handles _most_ of the spill over coming from the first cluster, and that the E cluster isn’t needed as much for spillover, and so it can be dedicated more towards handling the low priority work where latency doesn’t matter.




Hmm. So macos favors keeping 1 p-cluster full before going to the second p cluster for anything? I wonder if that’s so they can flip back and forth between the two in order to reduce hot spots, or a power conservation move - once you start using a cluster for something you have to keep it powered, so no point powering up the second unless you need it. (At the coarsest level, it takes multiple cycles to power up a block, so you don’t want to be flipping it off and on needlessly).


----------



## throAU

Cmaier said:


> Hmm. So macos favors keeping 1 p-cluster full before going to the second p cluster for anything?




Definitely seems to be the case from what I've observed/heard elsewhere.

Seems that it heavily prioritises the first four P cores until load reaches a certain percent.


----------



## Yoused

Cmaier said:


> Hmm. So macos favors keeping 1 p-cluster full before going to the second p cluster for anything? I wonder if that’s so they can flip back and forth between the two in order to reduce hot spots, or a power conservation move - once you start using a cluster for something you have to keep it powered, so no point powering up the second unless you need it. (At the coarsest level, it takes multiple cycles to power up a block, so you don’t want to be flipping it off and on needlessly).



IIUC, each cluster shares a L2, so filling up one may be more power-efficient than, say, running 2 in one cluster and 2 in the other. If very high raw performance, rather than efficiency, is what you are after, it might be more effective to use clusters sparsely so that each core has more L2 to work with – although, that might depend on how memory-intensive the work is.


----------



## mr_roboto

Cmaier said:


> Hmm. So macos favors keeping 1 p-cluster full before going to the second p cluster for anything? I wonder if that’s so they can flip back and forth between the two in order to reduce hot spots, or a power conservation move - once you start using a cluster for something you have to keep it powered, so no point powering up the second unless you need it. (At the coarsest level, it takes multiple cycles to power up a block, so you don’t want to be flipping it off and on needlessly).



I got my M1 Max a few days ago.  I've done a little powermetrics-watching while starting/stopping CPU eater processes.  The two performance clusters are P0 and P1, and it sure looks like macOS is optimizing for power. With the system idle P1 spends a lot of time power gated (0 mW).  (Note: to see this you may need to use powermetrics' -i option to set a shorter sampling interval, P1 does get woken every so often and the longer the sampling interval the less likely it is that it averaged 0mW over a whole interval.)

I can start up one, two, three, or four CPU eaters on an otherwise idle machine and I never see P1 go beyond mostly-off until the fourth eater is running.

(edit: to clarify, when you start the fourth process, P1 doesn't go full active, it just gets woken up a bit more since presumably there's a higher chance of system threads occupying a few cores pushing the scheduler to want more than 6 cores active.  P1 doesn't stay on consistently unless I start a fifth CPU eater.)


----------



## Cmaier

mr_roboto said:


> I got my M1 Max a few days ago.  I've done a little powermetrics-watching while starting/stopping CPU eater processes.  The two performance clusters are P0 and P1, and it sure looks like macOS is optimizing for power. With the system idle P1 spends a lot of time power gated (0 mW).  (Note: to see this you may need to use powermetrics' -i option to set a shorter sampling interval, P1 does get woken every so often and the longer the sampling interval the less likely it is that it averaged 0mW over a whole interval.)
> 
> I can start up one, two, three, or four CPU eaters on an otherwise idle machine and I never see P1 go beyond mostly-off until the fourth eater is running.
> 
> (edit: to clarify, when you start the fourth process, P1 doesn't go full active, it just gets woken up a bit more since presumably there's a higher chance of system threads occupying a few cores pushing the scheduler to want more than 6 cores active.  P1 doesn't stay on consistently unless I start a fifth CPU eater.)



Fascinating. Intels strategy, at least back when I followed such things, was to spread things out to avoid overheating.


----------



## mr_roboto

Cmaier said:


> Fascinating. Intels strategy, at least back when I followed such things, was to spread things out to avoid overheating.



Maybe you only get pushed into that when individual cores need lots more than 5W @ Fmax.


----------



## Cmaier

mr_roboto said:


> Maybe you only get pushed into that when individual cores need lots more than 5W @ Fmax.



I was trying to calculate the actual active channel density in M1 the other day, but I couldn’t get the math to work (it requires a bunch of guesses as to the average N-channel area.). I suspect it’s lower than Intel, possibly due to some 1-of-N routing stuff that they may be doing based on what they got from the Intrinsity folks, but who knows.


----------



## Nycturne

Cmaier said:


> Hmm. So macos favors keeping 1 p-cluster full before going to the second p cluster for anything? I wonder if that’s so they can flip back and forth between the two in order to reduce hot spots, or a power conservation move - once you start using a cluster for something you have to keep it powered, so no point powering up the second unless you need it. (At the coarsest level, it takes multiple cycles to power up a block, so you don’t want to be flipping it off and on needlessly).




I keep going back to ecleticlight, since it's a great way to confirm if weird stuff I see is accurate or not: https://eclecticlight.co/2021/11/04/m1-pro-first-impressions-2-core-management-and-cpu-performance/


----------



## Cmaier

Nycturne said:


> I keep going back to ecleticlight, since it's a great way to confirm if weird stuff I see is accurate or not: https://eclecticlight.co/2021/11/04/m1-pro-first-impressions-2-core-management-and-cpu-performance/



Fascinating


----------



## throAU

Nycturne said:


> I keep going back to ecleticlight, since it's a great way to confirm if weird stuff I see is accurate or not: https://eclecticlight.co/2021/11/04/m1-pro-first-impressions-2-core-management-and-cpu-performance/



Yeah discovered that site a couple of weeks ago, heap of interesting info there for Apple Silicon adoption.


----------

