x86 CPUs from AMD & Intel current/future releases.

not get bored.
Lots of good points but this is always a problem I feel. Apple is all too often subject to putting all its attention on the new shiny object.

If all that happens at WWDC is an announcement that every built in app or utility was getting a dedicated person or team and they would pursue a life of predictable, boring, incremental improvements, I’d be overjoyed.
 
I wonder if we might get GPTK 2.0 at WWDC? It might bring Vulkan translation?
I don’t think Vulkan translation is necessarily a problem is it? The problem with DX was specifically DX12. The open source community was having trouble. I believe that they had made steps before Apple’s release but prior to GPTK there wasn’t a good, performant solution.

Ah I do see that MoltenVK only goes to 1.2. So maybe?
Lots of good points but this is always a problem I feel. Apple is all too often subject to putting all its attention on the new shiny object.

If all that happens at WWDC is an announcement that every built in app or utility was getting a dedicated person or team and they would pursue a life of predictable, boring, incremental improvements, I’d be overjoyed.
One problem is also that Apple’s success obscures the issues. One criticism of Jonathan’s point, only somewhat alluded to in the original Mastodon thread, is that Apple operates the largest gaming platform in the world, the largest App platform in the world. Now Jonathan could (and did) counter that the most interesting stuff, graphically, isn’t being done there and that many developers view Metal as a cost of doing business with iOS. I don’t have that level of insight so I’ll assume he’s correct. But the point remains, to Apple, Metal underlines the most successful platform anywhere so where’s the problem? This point both delivers yet another reason why Apple won’t abandon Metal anytime soon (which everyone agrees), but also why Apple’s upper echelons may not see the problems Jonathan brings up.
 
On the note of Apple’s scheduling, I don’t think “E Cores first then P cores” is really the accurate way to describe it. Intel is doing that in a sort of containment way, as if their P cores are just too much power and they want to keep “most real work” on the E cores.

Apple doesn’t do this. They use E Cores for background or some general tasks depending on circumstace, for two reasons:

A: free up P cores which they can use more efficiently than Intel can.
B: save energy without hurting UX too much, and save area instead of just adding 4 more p cores that are scheduled at lower clocks.

Here’s my comment in response to someone — anyone here have input?

Post in thread 'Qualcomm Snapdragon Thread'
https://forums.anandtech.com/threads/qualcomm-snapdragon-thread.2616013/post-41226702


IMG_3880.jpeg



(Apple DOES make a lot of use out of their E Cores and I think more on iOS which makes a ton of sense, but this above isn’t how I’d describe MacOS scheduling)
 
As I note, if anything Apple will even spill work over to the e cores from the P cores if the P cluster is too occupied, for certain high priority stuff.

But not the other way around. They do migrate I think but not for that reason or because they’re scared of P energy
 
On the note of Apple’s scheduling, I don’t think “E Cores first then P cores” is really the accurate way to describe it. Intel is doing that in a sort of containment way, as if their P cores are just too much power and they want to keep “most real work” on the E cores.

Apple doesn’t do this. They use E Cores for background or some general tasks depending on circumstace, for two reasons:

A: free up P cores which they can use more efficiently than Intel can.
B: save energy without hurting UX too much, and save area instead of just adding 4 more p cores that are scheduled at lower clocks.

Here’s my comment in response to someone — anyone here have input?

Post in thread 'Qualcomm Snapdragon Thread'
https://forums.anandtech.com/threads/qualcomm-snapdragon-thread.2616013/post-41226702


View attachment 29812


(Apple DOES make a lot of use out of their E Cores and I think more on iOS which makes a ton of sense, but this above isn’t how I’d describe MacOS scheduling)
I wonder if Apple does it this way because they can. macOS has quite a sophisticated qos system for processes that evolved from Grand Central Dispatch. I don’t think Windows has something as good, although I admit it’s a while since I looked.

In essence, tasks can be marked with a tag indicating the kind of priority (background, interactive, network, I/O etc) and this lets the os assign these tasks to the appropriate cpu cores.

EDIT: this document in the xnu sources explains Apple‘s current state of the art “Clutch Scheduler”.
https://github.com/apple-oss-distributions/xnu/blob/rel/xnu-6153/osfmk/kern/sched_clutch.md
 
Last edited:
I wonder if Apple does it this way because they can. macOS has quite a sophisticated qos system for processes that evolved from Grand Central Dispatch. I don’t think Windows has something as good, although I admit it’s a while since I looked.

In essence, tasks can be marked with a tag indicating the kind of priority (background, interactive, network, I/O etc) and this lets the os assign these tasks to the appropriate cpu cores.
Slight tangent: I remember when Apple first announced GCD some “real” programmers dismissed it because how hard is multithreaded programming really? 🙃 But yeah obviously at the OS level it’s been super helpful especially with the advent of heterogeneous cores. I can understand why some, especially for certain applications, don’t want to lose control over the ability to pin a process to a particular core, but overall …
 
I wonder if Apple does it this way because they can. macOS has quite a sophisticated qos system for processes that evolved from Grand Central Dispatch. I don’t think Windows has something as good, although I admit it’s a while since I looked.

In essence, tasks can be marked with a tag indicating the kind of priority (background, interactive, network, I/O etc) and this lets the os assign these tasks to the appropriate cpu cores.

EDIT: this document in the xnu sources explains Apple‘s current state of the art “Clutch Scheduler”.
https://github.com/apple-oss-distributions/xnu/blob/rel/xnu-6153/osfmk/kern/sched_clutch.md
Right, it does help.

But also there is a scheduler for Android that moves things by priority and user interactivity and works well — I think that the big difference is the cores

Apple can and does want to use the p cores in a way Intel really doesn’t because of power

I will have to get back and explain what I mean, but while everyone makes use of E Cores, Intel’s containment strategy here leads me to believe they’re being quite severe for the average case — with the P cluster also being used more often pending power policy
 
Right, it does help.

But also there is a scheduler for Android that moves things by priority and user interactivity and works well — I think that the big difference is the cores

Apple can and does want to use the p cores in a way Intel really doesn’t because of power

I will have to get back and explain what I mean, but while everyone makes use of E Cores, Intel’s containment strategy here leads me to believe they’re being quite severe for the average case — with the P cluster also being used more often pending power policy
For what it’s worth that’s my sense of it as well, but I’m not enough of an expert to really give you the extra input that you requested in the previous post. @Andropov, @Nycturne, and I think @leman (and probably others here that’s just off the top of my head) know waaaay more about how Apple scheduling works and its ins and outs than me.
 
Slight tangent: I remember when Apple first announced GCD some “real” programmers dismissed it because how hard is multithreaded programming really? 🙃 But yeah obviously at the OS level it’s been super helpful especially with the advent of heterogeneous cores. I can understand why some, especially for certain applications, don’t want to lose control over the ability to pin a process to a particular core, but overall …
Absolutely. I was always on Arstechnica at the time of Snow Leopard, and one of their “Big Hitters” was Peter Bright aka Dr Pizza. An awful person for reasons I won’t go into here, but you can google if you wish. He was a very vocal Windows expert and dismssed GCD at the time. Within a year he had taken the GCD sources and tried to make his own version for Windows.

Really great programmers can indeed do concurrency well, and they can still do that. Most can’t or shouldn’t, and that is who GCD is for.
 
Intel is still making WAY more progress than AMD on total platform power (like the power delivery and PMIC changes they say helps, the partitioning where the E cluster has it’s own voltage plane, new fabric stuff, the explicit focus on P core efficiency design etc

But I suspect they’re still hurting enough on the last part and the ringbus issue to really need this to compete with Apple and QC. Which I mean that’s fine if it works, but I suspect Apple, Arm and QC all have generally better tools to work with on this issue any way you slice it.
 
For what it’s worth that’s my sense of it as well, but I’m not enough of an expert to really give you the extra input that you requested in the previous post. @Andropov, @Nycturne, and I think @leman (and probably others here that’s just off the top of my head) know waaaay more about how Apple scheduling works and its ins and outs than me.
I know just enough to know that guy saying “Apple does this too” is definitely wrong, based on QoS tiers, but I am not saying using E Cores is bad — in iOS I am pretty sure E Cores are used far more often and often for some UI tasks

But the way I can also be sure of this is not just Apple’s own documents and all but also Qualcomm. They are using 12 fat cores and even clocked down you can see they do well at low floors which Apple’s can too, and in addition to that, the fact that early web browsing/video/etc results (where idle is important but only half the equation) seem to indicate they’re still doing fine. Also Apple still throws like 6-12 P cores on some SKUs, so….

That and my own use of an M1 and M2 Mac. E cores are used a lot for general tasks in the background or sometimes mid priority stuff, but man those P cores absolutely are used and for general stuff, it usually doesn't look like a spillover from E Cores.


But like you I don’t know the nitty gritty. I know about Android scheduling and Windows or Linux but Apple’s is different.

One question I have is, if you don’t make use of GCD, what happens then? It’s the default QoS, in that 20-30 range and then the kernel decides? How does it decide?
 
I know just enough to know that guy saying “Apple does this too” is definitely wrong, based on QoS tiers, but I am not saying using E Cores is bad — in iOS I am pretty sure E Cores are used far more often and often for some UI tasks

But the way I can also be sure of this is not just Apple’s own documents and all but also Qualcomm. They are using 12 fat cores and even clocked down you can see they do well at low floors which Apple’s can too, and in addition to that, the fact that early web browsing/video/etc results (where idle is important but only half the equation) seem to indicate they’re still doing fine. Also Apple still throws like 6-12 P cores on some SKUs, so….

That and my own use of an M1 and M2 Mac. E cores are used a lot for general tasks in the background or sometimes mid priority stuff, but man those P cores absolutely are used and for general stuff, it usually doesn't look like a spillover from E Cores.


But like you I don’t know the nitty gritty. I know about Android scheduling and Windows or Linux but Apple’s is different.

One question I have is, if you don’t make use of GCD, what happens then? It’s the default QoS, in that 20-30 range and then the kernel decides? How does it decide?
I don't remember. But yeah in terms of what you wrote wrt to overall design philosophy, I completely agree about Intel's E-core vs Apple and ARM E-cores. One can make an argument that Intel's E-cores are somewhat analogous to an ARM middle-cores but with a greater emphasis on saving die area (especially in Alder Lake after the disaster that was Rocket Lake and especially against Zen 3) not just wattage in multithreaded throughput (obviously the two are related but ...). Apple's E-core is just really interesting. In some ways it is becoming more of a middle-core, certainly in terms of performance, but as much as the Pro and the base Mx processors have emphasized more E-cores in their latest designs, I have to agree that the P-cores are still the heavy lifters of even multithreaded throughput and the Max shows this. Like I can't see Apple releasing an M4/5 processor like the 14900K with 8 P-cores and 16 E-cores for high powered devices. That just wouldn't make sense, but for Intel it was almost a necessity.
 
I asked Howard Oakley. It seems they get assigned a QoS class of default and preferentially run on the P core.
View attachment 29814

thank you Jimmy! Awesome.

Exactly what I was thinking as you can tell and is also a point against the idea MacOS is like, deliberately lurching towards putting cores on the E Cores by default — even if it’s not specified, it seems like that is not true.



I am pretty sure — Leman will know more and I need to go back on this — that the top definition of QoS 33 (user interactive or user initiated) is exclusive to the P cores, more than even preferential.

And then the default (numbering in the 20’s) is preferential but depends

QoS 9 seems to be background if defined, and that stuff always runs on the E cores, no matter what — which is good.
 
I don't remember. But yeah in terms of what you wrote wrt to overall design philosophy, I completely agree about Intel's E-core vs Apple and ARM E-cores. One can make an argument that Intel's E-cores are somewhat analogous to an ARM middle-cores but with a greater emphasis on saving die area (especially in Alder Lake after the disaster that was Rocket Lake and especially against Zen 3) not just wattage in multithreaded throughput (obviously the two are related but ...). Apple's E-core is just really interesting. In some ways it is becoming more of a middle-core, certainly in terms of performance, but as much as the Pro and the base Mx processors have emphasized more E-cores in their latest designs, I have to agree that the P-cores are still the heavy lifters of even multithreaded throughput and the Max shows this. Like I can't see Apple releasing an M4/5 processor like the 14900K with 8 P-cores and 16 E-cores for high powered devices. That just wouldn't make sense, but for Intel it was almost a necessity.
Yeah, also another thing is — Apple actually could do even more E Cores and it would boost their MT/W and perf/W at low loads, and it would be fine, but there’s just no reason to beyond a point — specifically like that 8 + 16 would be extremely stupid. 12 + 6-10? That’s one thing, and they honestly should do that for M4 Max, but they’re not relying on E Cores as their main cores on Macs.
 
thank you Jimmy! Awesome.

Exactly what I was thinking as you can tell and is also a point against the idea MacOS is like, deliberately lurching towards putting cores on the E Cores by default — even if it’s not specified, it seems like that is not true.



I am pretty sure — Leman will know more and I need to go back on this — that the top definition of QoS 33 (user interactive or user initiated) is exclusive to the P cores, more than even preferential.

And then the default (numbering in the 20’s) is preferential but depends

QoS 9 seems to be background if defined, and that stuff always runs on the E cores, no matter what — which is good.
Default is definitely a reasonably high priority.
1717709219042.png


I am curious how responsive ordinary tasks will be on Lunar Lake if things default to E cores. They may well be fast enough, but if they get migrated to the P core, is their a chance of stutter or lag?
 
On iOS and iPadOS, FWIW, I am pretty sure the scheduling turnover point — where they’d decide to run something on a P core — probably differs, and/or the frequency ramping, which makes a lot of sense.

As a tangent:

If you put your phone on low power mode, it’ll cut ST down to like 850-950 GB6 I think. But you won’t necessarily feel this as much as you’d think, and I think part of that is down to greater use of E Cores at baseline besides different frequency ramping (which is important).

Another tangent:

You can feel it a bit more in MacOS where it chops P cores down to 1.9GHz on regular M chips and 2.7 I think on Pro/Max.

But even then, one thing I noticed was 1.9GHz M2 MacBooks still feel so much faster and more responsive than Intel stuff that was “in theory” around the same peak ST.

I think the reason for this is partially about E Cores, but also about frequency ramping and stability of ST. An Ice Lake MacBook might be able to do a 1000-1100 GB5 ST, like a low power mode M1/2), but that was using 15+ watts and only at peak, you’d be pretty quickly throttled down from it, and the thermals then limited your other cores. So effectively what they could ramp for user interactions and sustain was not really that peak, even if you didn’t care about power.

Whereas 1.9GHz on an M laptop can still be ramped to quite quickly without hurting power, can stay there indefinitely too wrt heat/power.

Just an underrated dimension to “performance” and why I think ST perf/W curves and with them heat are still important.
 
Default is definitely a reasonbly high priority.
View attachment 29816

I am curious how responsive ordinary tasks will be on Lunar Lake if things default to E cores. They may well be fast enough, but if they get migrated to the P core, is their a chance of stutter or lag?
This is one reason I a add an asterisk about battery tests in the real world with mixed loads — even though I favor those tests to video stuff— I think stuff like “AMD is pretty close to Apple on power saving mode for automated web browsing” understates things, because you’re not actually measuring the small performances of a button or webpage loading. So the responsiveness could be notably inferior because they have to to save power, albeit with similar battery life to Apple running things normally.

Still beats all-out MT loads or video tests though, because you get idle power intermittently + real ST use + some background stuff.
 
Back
Top