Where do we think Apple Silicon goes from here?

Hard to know at this point. Whatever this thing is, it would likely be an option for mac pro’s, too, though they’d sell a lot more of them in imac pros. And if there are 12 cores, how many are efficiency cores, if any? Lots of unknowns.
Good point about it likely being an option for Mac Pro, and it fits - since M1 Pro/Max were disclosed it's been clear they need three chip designs to cover their full line.
  • M1: Tablet / lightweight notebook / low-end desktop
  • M1 Pro/Max: High performance notebook / midrange desktop (I bet we're gonna see M1 Max in the Mini and the mainstream 27" class iMacs)
  • M1 ???: High end desktop / workstation
If this guy's accurate, here's my SWAG on what Apple might be doing in the M1 ??? (maybe "Extreme"?):
  • somewhat derivative of M1 Max, no need to totally reinvent the wheel here
  • replace the 2c E cluster with a 4c P cluster, this gets us to 12 cores
  • same GPU core count, or maybe a bit more? But we don't really want to blow the die size up too much.
  • same memory interface (available tests suggest M1 Max memory BW is overkill, won't need more with a modest increase in core count)
  • coherent off-die interconnect to support 1/2/4 die configs
  • more PCIe
 
That is an interesting question. If you take a look at the benchmarks for Alder Lake, the efficiency cores don't seem to add anything to the performance equation, at least for gaming. Now, I realize I'm comparing apples to fungus, and that these are different situations. Apple has control over the whole widget, while Intel has to pray that Microsoft can bother to optimize the Windows 11 scheduler for the efficiency cores. Of course, Apple has a different architecture, doesn't target gaming, and can have the macOS team working with the Apple Silicon engineers hand-in-glove. That being said, it does make me wonder if Apple will bother with the efficiency cores with the professional Macs, where thermals and energy consumption are far less of a consideration.
If Apple thinks that isn’t a benefit on the desktops, or has some other scheduler trick up their sleeves, I could see Apple ditching it on the highest end parts.

But with how the E cores also act as a sink for low priority work to allow it to get done without having to share cores with user work (in many cases), it seems a bit odd to go heavy on AMP just to go back to SMP for the Mac Pro. That said, I don’t think a Mac Pro with 4 dies would get great use of 4-8 E cores. But a couple wouldn’t be a bad thing as you do get some benefits by having fewer context switches hitting threads running stuff important to the user.

If this guy's accurate, here's my SWAG on what Apple might be doing in the M1 ??? (maybe "Extreme"?):
  • somewhat derivative of M1 Max, no need to totally reinvent the wheel here
  • replace the 2c E cluster with a 4c P cluster, this gets us to 12 cores
  • same GPU core count, or maybe a bit more? But we don't really want to blow the die size up too much.
  • same memory interface (available tests suggest M1 Max memory BW is overkill, won't need more with a modest increase in core count)
  • coherent off-die interconnect to support 1/2/4 die configs
  • more PCIe

One thing that I wonder is if this this die exists, can it be paired with an M1 Max die to get 20 P cores an 2 E cores on a two die setup? More from the standpoint that this sort of asymmetric layout might make some sense if Apple wanted to keep their AMP design without being stuck with more E cores than they need for the multi-die designs. But if an iMac is supposed to get one of these dies, rather than two dies, then probably not the direction Apple would go.
 
One thing that I wonder is if this this die exists, can it be paired with an M1 Max die to get 20 P cores an 2 E cores on a two die setup? More from the standpoint that this sort of asymmetric layout might make some sense if Apple wanted to keep their AMP design without being stuck with more E cores than they need for the multi-die designs. But if an iMac is supposed to get one of these dies, rather than two dies, then probably not the direction Apple would go.
I think we'll see 2-die iMacs. But I don't think they'll be asymmetric.

While I'm a huge fan of the E cores and am actually disappointed in only having two in M1 Pro/Max, I think it's plausible Apple drops them from desktop-only chips. Their P cores should be efficient enough for desktop systems.
 
  • same GPU core count, or maybe a bit more? But we don't really want to blow the die size up too much
For what it is worth, Luke Miania did a video speculating on this topic. He spoke with Dylan about the upcoming iMac Pro and Dylan believes that the GPU core count is going to go up, as well. It's not clear if that is Dylan's speculation or if he has received some unverified information on the subject, which he simply hasn't decided to share yet.
 
I think we'll see 2-die iMacs. But I don't think they'll be asymmetric.

While I'm a huge fan of the E cores and am actually disappointed in only having two in M1 Pro/Max, I think it's plausible Apple drops them from desktop-only chips. Their P cores should be efficient enough for desktop systems.

After dissecting the code in Apple’s scheduler, I’m not surprised they only went with two. What would you expect the extra E cores to be doing, might I ask?

That said, Apple’s scheduler gets benefits from having them around that go beyond just power usage, in ways that impacts the perceived performance by users and how snappy the device is. And the M1 Pro/Max in particular get that benefit without having to pay for more E cores than will actually get used in practice.

For me the question really is, is Apple willing to accept the overhead of running low-priority tasks on the same cores as everything else (i.e. SMP) on these desktop systems? Overhead their other SoCs don’t currently pay? Or are they going to employ some other scheduler tricks to mimic it in other ways, such as defining a CPU cluster as “for background” use?
 
After dissecting the code in Apple’s scheduler, I’m not surprised they only went with two. What would you expect the extra E cores to be doing, might I ask?
When I first got this M1 Max I watched powermetrics a lot (as you do if you're a weirdo like me), and observed that under circumstances where my M1 Air only had to use E cores, the M1 Max was usually spinning up P cores. Basically, there are light productivity scenarios where four E cores are nice to have.

That said, Apple’s scheduler gets benefits from having them around that go beyond just power usage, in ways that impacts the perceived performance by users and how snappy the device is. And the M1 Pro/Max in particular get that benefit without having to pay for more E cores than will actually get used in practice.

For me the question really is, is Apple willing to accept the overhead of running low-priority tasks on the same cores as everything else (i.e. SMP) on these desktop systems? Overhead their other SoCs don’t currently pay? Or are they going to employ some other scheduler tricks to mimic it in other ways, such as defining a CPU cluster as “for background” use?
For high performance desktops I think they can do fine with zero E cores. They'd also be fine with nonzero, but I just don't think they need them when a battery isn't in the picture.

Responsiveness and low-pri tasks shouldn't be an issue. When P cores are a scarce resource, being able to offload low priority tasks is nice. But if you build a big machine with tons of P cores, they're not scarce anymore. (I don't see what the point would be in reserving a cluster for background tasks, btw.)

So for all that I'm a huge fan of Icestorm, I think that Macs don't seem to need more than four of them, and high performance desktop Macs shouldn't need them at all. The only case you can make for them is that technically they're better perf/W and perf/mm^2 than Firestorm, but that comes at the cost of needing nearly 4x as many threads and substantially worse interactive performance. Amdahl's law always gets you in the end. Also, I suspect Apple thinks that embarrassingly parallel compute should be shifted to the GPU cores instead.
 
Alder Lake does support it. It isn’t a new feature either, with Apple using it for years as part of the Video Toolbox API. The catch is more that in general, the hardware encode blocks are fast, but not horribly flexible. If it doesn’t support the codec you want to use, you are SOL. So the main advantage here that Apple has is that they don’t have to wait on Intel for certain codecs (ProRes), and they can tune it for their cases more specifically, even if it isn’t as efficient on final size at the same quality as x265 for HEVC/H.265 video.
Are media encoding blocks suitable for arbitrarily high quality settings? It is often said that hardware decoding outputs worse quality video than software encoding even at the highest setting, but I don't know if that's true. It's one of those things that I see repeated everywhere but can't find the source.

Responsiveness and low-pri tasks shouldn't be an issue. When P cores are a scarce resource, being able to offload low priority tasks is nice. But if you build a big machine with tons of P cores, they're not scarce anymore. (I don't see what the point would be in reserving a cluster for background tasks, btw.)
If you had, say, 20 P cores and a compute task running on 20 threads, couldn't a background task running alongside them cause unwanted context switches and decrease performance? I don't know how big of an impact it could have, but that's the #1 reason I can think of why having at least 2 E cores would be nice to have on desktop.

It's also nice that low priority tasks get confined to the E cores so a background QoS process can't take up too many CPU resources, but I guess background tasks could be capped via software scheduling to achieve the same effect on a P-only CPU.
 
When I first got this M1 Max I watched powermetrics a lot (as you do if you're a weirdo like me), and observed that under circumstances where my M1 Air only had to use E cores, the M1 Max was usually spinning up P cores. Basically, there are light productivity scenarios where four E cores are nice to have.

Which is odd because the scheduler follows some specific rules:
- Background level threads are not promoted out of the E cores. Instead they get throttled when the E cores are full.
- User level threads are assigned P cores. If the P cores run out, then they can take over the E cores either through spillover (if the E cores aren’t idle) or work stealing (if the E cores are idle). Work stealing in particular is designed to keep latency of user-initiated work down as much as possible.

Apple’s scheduler doesn’t assign ”light loads” to E cores, and then spin up P cores once a light load is exceeded. It very much follows the “race to sleep” model with threads running user-initiated work on the P cores first as a way to keep latency down. So unless the developer explicitly defines work as utility or background, it’s not going to wind up on the E cores on macOS.

This scheduler makes a ton of sense on iOS in particular where every process other than the 1-3 in the foreground, can have their priority overridden and any work they do in the background shunted over to the E cores. So having the extra E cores there makes a lot of sense. Less so on macOS where processes that don’t have first responder status can still run user-initiated work. But there’s still quite a bit of background work and many threads that need to be scheduled.

For high performance desktops I think they can do fine with zero E cores. They'd also be fine with nonzero, but I just don't think they need them when a battery isn't in the picture.

Responsiveness and low-pri tasks shouldn't be an issue. When P cores are a scarce resource, being able to offload low priority tasks is nice. But if you build a big machine with tons of P cores, they're not scarce anymore. (I don't see what the point would be in reserving a cluster for background tasks, btw.)
The issue is one of context switching impacting the higher pri tasks. You can’t just starve out low priority threads, and you have to give them CPU time at some point. So there is an advantage of shunting them off somewhere else where the overhead of context switches aren’t being paid for during user-initiated work as often. The M1 Max/Pro gets the benefits of both more energy efficiency by placing limits on how much power background tasks can consume, and less interruption on the P cores which helps improve task latency and the race to sleep on the P cores.

Keep in mind this sort of overhead is one thing that Swift Concurrency helps reduce with the cooperative threading model it uses, to the point that Apple explicitly points it out as a benefit during their WWDC talks. So it makes sense that Apple would care about it when talking about the number of background threads being kicked around as well, which is quite high compared to 20 years ago. But it still works well with a *nix style system with a lot of daemons and services.
 
Are media encoding blocks suitable for arbitrarily high quality settings? It is often said that hardware decoding outputs worse quality video than software encoding even at the highest setting, but I don't know if that's true. It's one of those things that I see repeated everywhere but can't find the source.
It's mostly the tradeoff in that a hardware encoder is aimed at realtime performance. So while you can get similar quality, you don't get the same efficiency in the final result. And if you run into a type of content that the hardware decoder has some issues with (extra macroblocking/etc), then the quality can suffer a bit, and it's not like the hardware block will get a bugfix.

If you had, say, 20 P cores and a compute task running on 20 threads, couldn't a background task running alongside them cause unwanted context switches and decrease performance? I don't know how big of an impact it could have, but that's the #1 reason I can think of why having at least 2 E cores would be nice to have on desktop.

It's also nice that low priority tasks get confined to the E cores so a background QoS process can't take up too many CPU resources, but I guess background tasks could be capped via software scheduling to achieve the same effect on a P-only CPU.
Apple's scheduler does schedule based on the CPU cluster, where it currently favors the P core clusters in order. So it will fill up one cluster, then the next, and then the next. E core clusters are just a different type of cluster to the scheduler that automatically gets low priority threads assigned to it. So yeah, you could designate the "last" P cluster as the one receiving these low priority threads without much upheaval in how the Apple's AMP scheduler works and maintaining the benefits. This is the sort of work done at interrupt time. About the only real way to break this is through overriding thread priority, which can inform the scheduler what to do, but isn't something baked into the scheduler itself, and as far as I know, the platform itself doesn't use this approach for the active user (unlike iOS which likely does).

It turns out electiclight dug into the scheduler as well, and has similar findings that he posted yesterday: https://eclecticlight.co/2022/01/25/scheduling-of-threads-on-m1-series-chips-second-draft/

A couple interesting tidbits I didn't know before the article:
* M1 will not ramp up the core frequency on the E cores when it's only handling background priority threads, and hold it at ~1Ghz. But when spillover or work stealing happens, it will kick it up to ~2Ghz.
* M1 Pro/Max will ramp up the E cores to ~2Ghz when there are more background threads that need time on the cores (i.e. aren't sleeping waiting for I/O etc), but when only one thread per core is needed.
* the taskpolicy command can be used to push a process onto the E cores, giving access to the thread priority override behaviors. The command has been there a while, but has less impact on SMP schedulers like the one used for Intel Macs.
 
I think we'll see even more focus on GPU horsepower and memory bandwidth.

AR/VR/"the meta verse" is the next big thing (though all the current AR/VR gear is very much still early prototype level stuff) and there's a HUGE amount of additional 3d processing required for this to properly flourish.

I think I remember Carmack(?) or one of the other big game developers talking about theoretical VR 3d processing requirements to get something close to reality and it was something like 16k resolution per eye (in order to get good resolution over a decent FOV) and 120+ FPS. That's an amazing amount of 3d processing and texture/display bandwidth. Oh - and you need to do it on battery! No, cloud processing won't help because of the response time. Has to be on-device.

We're nowhere near that yet (most hardware is running in like... 8-10% of those pixel numbers - if even anywhere near that), but you know that's what Apple will be shooting for at least eventually.


Don't get me wrong. Even quest2 level onboard VR is good and fun. For gaming - but for proper augmented reality with lots of additional UI, etc. overlaid ... just not high res enough and the FOV is nowhere near what we want. And its too bulky.

There's huge scope for more 3d processing in more efficient power/thermal envelope. I think the M series SOC are well positioned to dominate that market. It's still very early days yet in terms of what our processing requirements for "must have" levels of VR/AR hardware will be moving forwards.
 
Last edited:
Good point about it likely being an option for Mac Pro, and it fits - since M1 Pro/Max were disclosed it's been clear they need three chip designs to cover their full line.
  • M1: Tablet / lightweight notebook / low-end desktop
  • M1 Pro/Max: High performance notebook / midrange desktop (I bet we're gonna see M1 Max in the Mini and the mainstream 27" class iMacs)
  • M1 ???: High end desktop / workstation

I kind of suspected the reason we haven't seen a new Mac Mini based on M1Pro/M1Max is they didn't want to simply jam notebook internals into the existing case - i.e., it'll be one of the E-core-less/desktop specific chip designs, even though the notebook chips would be a significant performance increase vs. the current Mini options.
 
Good point about it likely being an option for Mac Pro, and it fits - since M1 Pro/Max were disclosed it's been clear they need three chip designs to cover their full line.
  • M1: Tablet / lightweight notebook / low-end desktop
  • M1 Pro/Max: High performance notebook / midrange desktop (I bet we're gonna see M1 Max in the Mini and the mainstream 27" class iMacs)
  • M1 ???: High end desktop / workstation
If this guy's accurate, here's my SWAG on what Apple might be doing in the M1 ??? (maybe "Extreme"?):
  • somewhat derivative of M1 Max, no need to totally reinvent the wheel here
  • replace the 2c E cluster with a 4c P cluster, this gets us to 12 cores
  • same GPU core count, or maybe a bit more? But we don't really want to blow the die size up too much.
  • same memory interface (available tests suggest M1 Max memory BW is overkill, won't need more with a modest increase in core count)
  • coherent off-die interconnect to support 1/2/4 die configs
  • more PCIe

I think we'll see 2-die iMacs. But I don't think they'll be asymmetric.

While I'm a huge fan of the E cores and am actually disappointed in only having two in M1 Pro/Max, I think it's plausible Apple drops them from desktop-only chips. Their P cores should be efficient enough for desktop systems.

Drop the E cores, have three 4-core P clusters, swap LPDDR5 for LPDDR5X, add eight more GPU cores, for 40 total per die; I give you the M1 Ultra...!

M1 Ultra
  • 12-core CPU (all Performance cores)
  • 40-core GPU
  • 16-core Neural Engine
  • 256GB LPDDR5X RAM
  • 500GB/s UMA bandwidth
Dual M1 Ultra
  • 24-core CPU (all Performance cores)
  • 80-core GPU
  • 32-core Neural Engine
  • 512GB LPDDR5X RAM
  • 1TB/s UMA bandwidth
LPDDR5X is pin-compatible with LPDDR5, with a 33% performance boost, while using 20% less power...

I kind of suspected the reason we haven't seen a new Mac Mini based on M1Pro/M1Max is they didn't want to simply jam notebook internals into the existing case - i.e., it'll be one of the E-core-less/desktop specific chip designs, even though the notebook chips would be a significant performance increase vs. the current Mini options.

With the LONG lead times on custom configured MBP laptops, one would think the delay on a M1 Pro/Max Mac mini could simply be chip allocation; why introduce a new high-end ASi Mac mini when you cannot even get orders filled for the MBPs...?

Although I would not say no to a single SoC M1 Ultra Mac mini...!
 
Last edited:
Maybe 4x3 core clusters might be more likely (i.e. binned 16 core variants).

Then again, economy of scale wise, I'd be betting on Apple just tiling M1-Pro/Max with multiple sockets (or at least multiple dies on package) on the desktop pro machines. I'm not sure Apple would release something called "ultra" - "Max" is already their top end descriptor for stuff. And Max is short of "maximum" after all.

Take a leaf out of AMD's book - build huge numbers of the same (smaller) lego bricks to get manufacturing efficiency and economy of scale and just "glue them together". Assuming the architecture has been built to scale this way - but you'd certainly hope it has.

For the types of workloads that require these massive amounts of processing, I'm not sure that the socket to socket latency would be an issue; we're not talking about gaming machines here.
 
Last edited:
Drop the E cores, have three 4-core P clusters, swap LPDDR5 for LPDDR5X, add eight more GPU cores, for 40 total per die; I give you the M1 Ultra...!

M1 Ultra
  • 12-core CPU (all Performance cores)
  • 40-core GPU
  • 16-core Neural Engine
  • 256GB LPDDR5X RAM
  • 500GB/s UMA bandwidth
Dual M1 Ultra
  • 24-core CPU (all Performance cores)
  • 80-core GPU
  • 32-core Neural Engine
  • 512GB LPDDR5X RAM
  • 1TB/s UMA bandwidth
LPDDR5X is pin-compatible with LPDDR5, with a 33% performance boost, while using 20% less power...



With the LONG lead times on custom configured MBP laptops, one would think the delay on a M1 Pro/Max Mac mini could simply be chip allocation; why introduce a new high-end ASi Mac mini when you cannot even get orders filled for the MBPs...?

Although I would not say no to a single SoC M1 Ultra Mac mini...!
Quoting Digitimes yesterday
The US-based brand's 14- and 16-inch MacBook Pros, which received major upgrades in processors, panels, and industrial designs, have enjoyed robust demand since they were launched in October 2021.

However, the two notebooks still suffered from component shortages during that period. In addition to short supply of power ICs from TI, the unsatisfactory yield rates for the new miniLED panels also limited the numbers of notebooks delivered in the fourth quarter of 2021.

Digitimes Research expects the yield rate for the LCD modules for the two notebooks to improve significantly in the first quarter of 2022 and the two products' combined shipments are expected to surpass two million units in the quarter, up more than 10% sequentially.

Digitimes notebook article.
This is exactly the sort of thing they have good info on. I see no reason to assume that Apple have SoC supply problems, nor has it ever been mentioned in reports from the supply line.
 
With the LONG lead times on custom configured MBP laptops, one would think the delay on a M1 Pro/Max Mac mini could simply be chip allocation; why introduce a new high-end ASi Mac mini when you cannot even get orders filled for the MBPs...?

Although I would not say no to a single SoC M1 Ultra Mac mini...!

I imagine they had some prediction about supply chain, and since the Mini is a low volume product, they may have just back-burnered it.

That being said, a 10-P-core Mini Pro shows up, I'm in line! Extra points if it looks like a Borg cube :D

FWIW, I've got an '18 Mini, works fantastic, runs 24/7 for personal, [lots of] development work, you-name-it computing chores, and while the CPU is solid (it's the i7 flavor), as you know, the GPU is pretty craptacular. I don't do anything that's very graphic intensive, but I occasionally am aware of the lack of performance, plus, I'd like to swap to 2 x 27" 4K displays (currently using 2 x Dell 25" QHDs) and not have to worry about driving the extra pixels :D

(... and yeah, I considered an eGPU a few times, just a little too flaky for me ...)
 
Get themselves aligned with AAA gaming and moderate their pricing.
 
(... and yeah, I considered an eGPU a few times, just a little too flaky for me ...)
You know, it occurs to me that the transactional difference between PCIe and Thunderbolt is essentially nil. If there are not, there ought to be TB monitors out there that have an eGPU slot right in the monitor chassis. It would be all but indistinguishable from having the card on the Mbd.
 
(... and yeah, I considered an eGPU a few times, just a little too flaky for me ...)
I think it depends on the eGPU and how well it is designed to work with macOS. I've got a Black Magic RX 580 paired with my 2018 Mac mini. I've never had any issues using it within macOS, and despite it not being supported, have used it with Windows 10 through Boot Camp to play an occasional Windows-only game. You do need to follow a specific procedure to get it working, but it's not that hard. However, much like Boot Camp, it's a dead end. I haven't been able to get Windows 11 working on my mini despite trying the various hacks, so the writing is on the wall.

My setup is the polar opposite of your Mac mini, in that I got the base model i3 when they were announced, mainly because I was aware of how strong the rumors of the ARM transition were, so this was originally supposed to be a stopgap purchase. When I realized that the transition would take some time, I ended up pimping it out with an eGPU, external SSD to supplement the pathetic 128GB internal model, upgraded from 8GB to 64GB of system memory, and got a 21.5-inch LG UltraFine off of Ebay. I like to say that my computer is held together with sticks and bubble gum, yet somehow it works.

I wouldn't recommend an eGPU to anyone these days, unless you absolutely need the graphics power and require an Intel Mac. eGPU support is going extinct along with x86 on the Mac, so the investment isn't worth it, at this point. When my Intel Mac mini is no longer capable of running the software I require, then I'll likely replace it with another mini, probably an M3 generation, assuming TSMC's roadmaps hold.
 
(... and yeah, I considered an eGPU a few times, just a little too flaky for me ...)
Yeah, I kept having DisplayPort dropouts with a Vega 56 and I gave up after dealing with that.

I wouldn't recommend an eGPU to anyone these days, unless you absolutely need the graphics power and require an Intel Mac. eGPU support is going extinct along with x86 on the Mac, so the investment isn't worth it, at this point. When my Intel Mac mini is no longer capable of running the software I require, then I'll likely replace it with another mini, probably an M3 generation, assuming TSMC's roadmaps hold.

Also agree. I’m rather glad Apple’s GPUs aren’t super anemic for basic Metal/3D work like the Intel iGPU was. For me, CAD + Affinity Photo is the main benchmark of good realtime performance, and the 2018 Mini just couldn’t do the job for my hobby level work. But the basic M1 Mini could, and the M1 Pro/Max MBP is honestly “headroom to spare” for the work I do. 120fps realtime compositing in Affinity Photo in complicated projects is hilarious.
 
But the basic M1 Mini could, and the M1 Pro/Max MBP is honestly “headroom to spare” for the work I do.
”Headroom to spare” is a very nice thing to have though, (unless it comes with its own set of drawbacks.)

When it comes to future development of Apple silicon, various in-house wireless solutions and circuitry dedicated to supporting AR/VR seem likely.
 
Back
Top