M5 Pro and Max unveiled

Only reason I could see for that changing is if the Super core was *a lot* more power hungry, but it's the same as what we used to call P in the base M5, so we know for a fact it's not an unreasonably power hungry core that would make it infeasible to rely on it for most tasks

Agreed! But this line of logic is why I thought they wouldn't go with a Middle Core and they did! :) I'm not saying my "thread scheduling change to take advantage of the new core type" hypothesis is right either btw, I'm also a fan of the "un-core efficiency improvements" hypothesis. But the former is possible.
 
Again, you are appear to be confusing the computational demands of different workloads. The battery life improvement has been quoted for workloads where over 95% of the CPU time is spent waiting in a low-power mode, and where most cores are powered down. The power consumption of such workloads is dominated by the baseline power and other involved IP blocks - memory, caches, data busses, networking components etc. That’s what “uncore” refers to.
I am not conflating anything. I'm literally just comparing the fact that the Max chip gets objectively less battery life even in light usage than the Pro chip. Changing the core configuration of the Max chip to use less HP cores is going to help battery life generally speaking. And it did. If they chose to do so without creating a third type of core, you'd also get a side grade in performance for CPU like M3 Pro did, but they didn't, so now you get 15% increased multi threaded performance overall M5 Max vs M4 max
 
I don't think anyone is disputing that this new core type will change how Apple designs SOCs ... it already has ... or that this isn't a big deal ... it is very much so
Due respect, half this thread is about the name change, so forgive me, but up until the last 4 posts most of it hasn't been focused on what the implications are now and for other chips lol.

I think the misapprehension you are under is that the battery life tests are multi-core?
I'd like you to show the M4 Max and M4 Pro gets the same battery life on light tasks, using official or unofficial metrics. Until then I don't know why you're saying that. I don't have any misapprehension that streaming is a hardware accelerated light task on M chips lol.
 
It’s not ganging up just because we disagree. I promise you none of us are coordinating our responses in any way. In fact, us four don’t even seem to be addressing the same points (though there appears to be some overlap).
I don’t think we are ganging up on you at all, in fact I think it’s a very civil and constructive conversation. You can take as much time as you need to respond too, this is not a synchronous medium. Just something to think about, if you have multiple people (all experts in their own right) challenging your argument, maybe it’s a good time to reanalyze your premises?
To clarify, I should have used the word "pile on," because I wasn't trying to say it was some conspiracy, so im sorry if it came across negatively like that, but let's be real also: all 4 of you disagreed with me in some way or another for the last 3 pages, saying basically the same stuff and even citing one another's comments to explain why you disagree with me.

Even in claiming you "weren't ganging up on me" 2 people said that same thing within 3 mins of each other lmfao. The thread was very active with all of us contributing very quickly, so I felt compelled to keep it going, which isn't your fault at all, but also added to the pressure. You guys are great, and I enjoy reading what you write. I just needed to say what happened specifically
 
Even in claiming you "weren't ganging up on me" 2 people said that same thing within 3 mins of each other lmfao.
That’s a bit of a weird thing to mention. If you said “you four each molested my cat,” the fact that each of us says “no we didn’t” within a few minutes of each other is not proof of anything, either.
 
Uhmmmmm okay. This conversation getting a little strange to me. Like on multiple levels. I'm sorry to have upset you in previous days, apparently, but at this point I'm not really sure what I did that warranted it or why you're acting like this or even just said that. This entire thread is pretty out of character for you. I'm going to end my part of this thread here. Sorry to have upset you.
 
Uhmmmmm okay. This conversation getting a little strange to me. Like on multiple levels. I'm sorry to have upset you in previous days, apparently, but at this point I'm not really sure what I did that warranted it or why you're acting like this or even just said that. This entire thread is pretty out of character for you. I'm going to end my part of this thread here. Sorry to have upset you.
? I’m not upset at all. Not one bit. I simply disagree with a premise of yours.
 
Agreed! But this line of logic is why I thought they wouldn't go with a Middle Core and they did! :) I'm not saying my "thread scheduling change to take advantage of the new core type" hypothesis is right either btw, I'm also a fan of the "un-core efficiency improvements" hypothesis. But the former is possible.
Absolutely. I wouldn’t rule it out either. Just have no expectations of changes there given current knowledge and evidence. But definitely possible
 
? I’m not upset at all. Not one bit. I simply disagree with a premise of yours.
You just reworded my feedback and equated to claiming you guys did that, which I really struggle to understand why you picked that as a metaphor, it didn't even need a metaphor or analogy. You're either upset or aren't thinking your own responses back to me through enough. I don't appreciate it either way.

I didn't accuse you of that. I literally just asked -- lighthearted in intention, if not execution -- to stop piling on. You guys were doing that, and then when called out you piled on again in denial within minutes of each other. Do you see why changing the feedback to something as weird and irrelevant as what you said is not only inappropriate, but counter productive to the entire point of this specific feedback? I wasn't even looking for an apology, good god
 
You just reworded my feedback and equated to claiming you guys did that, which I really struggle to understand why you picked that as a metaphor, it didn't even need a metaphor or analogy. You're either upset or aren't thinking your own responses back to me through enough. I don't appreciate it either way.

I didn't accuse you of that. I literally just asked -- lighthearted in intention, if not execution -- to stop piling on. You guys were doing that, and then when called out you piled on again in denial within minutes of each other. Do you see why changing the feedback to something as weird and irrelevant as what you said is not only inappropriate, but counter productive to the entire point of this specific feedback? I wasn't even looking for an apology, good god
I know you didn’t accuse us of that. I was making an argument by way of analogy, a common rhetorical technique.
 
Could the M5 Pro and Max already be on N2? It has been in production for several months.
Apple's press page says "two third-generation 3-nanometer dies".

Usually the first few months of a new node are risk production - yields may be low, wafer throughput definitely low-ish.

From my sources, I believe the CPU cores are all on the same die, and the GPU is on the other die.
An interesting question (to me, anyways) is where the memory controllers live. You'd like them to be on the same die as the CPUs to minimize latency, but on the same die as the GPUs to scale memory controller count / bandwidth with GPU core count.

My guess is CPU + media engine + IO on one die, MCs + GPU on another, and the MC+GPU die has two shorelines (north and south) dedicated to die-to-die interconnect. (Allowing construction of an Ultra with two GPU die surrounded by two CPU die endcaps.)
 
An interesting question (to me, anyways) is where the memory controllers live. You'd like them to be on the same die as the CPUs to minimize latency, but on the same die as the GPUs to scale memory controller count / bandwidth with GPU core count.

My guess is CPU + media engine + IO on one die, MCs + GPU on another, and the MC+GPU die has two shorelines (north and south) dedicated to die-to-die interconnect. (Allowing construction of an Ultra with two GPU die surrounded by two CPU die endcaps.)

My guess is the same. What's more, due to how Apple's memory hierarchy work, the memory controllers will very likely be close to the SLC cache. It would ever interesting to see whether the memory access latency has changed for the CPU. I am also curious on whether it's 2D die arrangement or a 2.5D/3D.
 
I'm not really a fan of the term "super" core. Do we get "hyper" cores in a few years then?
I can understand that they want to distinguish between high-performance, more efficent performance, and efficency cores, but still...

Us in 2030: "They've gone to plaid!"

Also, this architecture looks similar to current Intel CPUs: A handfull performance cores, with a dozen more efficient cores.
But knowing Apple, I'm guessing that both the efficency and the "medium" performance cores are likely better than Intel's efficency cores (are they still directly derived from Atom cores as in the beginning?).

I've harped on this before, but Intel's approach is mostly kneecapped (IMO) by the fact that the scheduler Intel uses tries to leverage the efficiency cores for user-interactive stuff to keep the system from turbo-boosting into oblivion. So you get this weird setup where certain tasks will shift back and forth depending on if the scheduler thinks the thread will benefit from it. End result is that your performance is a little less deterministic than you might expect. Apple's approach is deterministic, and because of their overall efficiency, they can favor speed for user-interactive tasks.

It doesn't upset me. It perplexes me. I'm so confused that this thread is more upset about names than hyped about the performance lol. I mean I understand talking about it, but that's like the narrative of the whole thread

The "super" cores are the same as what we got in the fall. So, not much new there to get excited about. We can already guess there's a bit of a ST uplift over M5, but it's going to be similar to the uplift between base M4 and M4 Pro/Max in the MacBook Pros. These new middle ground cores are new, and interesting, but there's not as much meat to gnaw on off a press release. Not yet. Same with the chiplet fusion they are doing. What are you expecting to analyze here?

30% MT uplift over the M4 Pro/Max is good.

Possibly. I'm just not really sure the desired characteristics of changing the scheduling behaviour.
If something is explicitly labelled to be background work that isn't time sensitive, put it on the most efficient core we have, no-brainer. If all we have is background-tier jobs, clock-gate that cluster too.
Any work that may affect the responsiveness of the system or how quickly a task the user may be waiting on finishes, schedule on S cores first, and if there's so many parallel threads not yielding the CPU that we can spill the task Into the P cores, do that.

Agreed with everything you said here, and I'd just add that simple, deterministic behaviors in a scheduler is a benefit. It allows developers to make better choices about how they tag work for the scheduler. Complexity just adds more things to track and get right.
 
Explain to me why you think it's the N1 chip contributing most of the battery life increase and not the new chip, despite the fact that the MBA has the N1 chip and doesn't have any battery increase at all. I'm not the expert.

I don't know if it's the N1 chip contributing or something else (your point about M1 having the same chip and not experiencing an increase is a good one). My point is that I don't see how the changes to the CPU clusters could have resulted in significant power saving, unless they are also accompanied by a new, more efficient on-chip network and improvements to other IP blocks (=uncore) that drive these improvements.
 
Due respect, half this thread is about the name change, so forgive me, but up until the last 4 posts most of it hasn't been focused on what the implications are now and for other chips lol.

With equally due respect, other posters disagree with you and you are at least as responsible as anyone else for filling up the thread with posts on the subject.

I'd like you to show the M4 Max and M4 Pro gets the same battery life on light tasks, using official or unofficial metrics. Until then I don't know why you're saying that. I don't have any misapprehension that streaming is a hardware accelerated light task on M chips lol.
I think you mean to say that You "don't have any misapprehension that streaming is not a light task" or without the double negative You "don't have any misapprehension that streaming is a heavy task"? Because you are agreeing with the statement that streaming and web browsing are light tasks, yes? I'm just making sure that we are all on the same page.

The reason I bring this up is that I can only think of two reasons why you might think changing the E- to P-cores has an effect on battery life tests:

1) that the P-cores are being used in these battery life tests because:
a) the battery life tests being quoted are MT tests such that all the cores are being used
b) Apple has changed the scheduler such that the battery life test is running on the P-core instead of the S-core (this is the addendum I brought up earlier)

2) that unused cores, especially S-cores, have very high idle power draws

We have explained 1a) doesn't work as an explanation and I think you agree, yes?

Number 2 doesn't work either. You can confirm yourself using powermetrics on your Mac. Powermetrics has its issues but you can see the CPU power when idling (well not full idle after all powermetrics is on!). For my computer it's using about 0.1W of power give or take (again powermetrics is on), if I were to turn on say Cinebench ST, it jumps to over 5W, even a light task will easily use 3W. Overall idle power on the 16" MPB M4 Max is ~6W and the Cinebench ST uses ~22W (for ~16W load-idle power). Notice that overall idle power is 60x the "idle" (not completely) power of the CPU while the device load power is only ~4x the CPU's? When clusters are off, they are drawing little to no power.

That leaves 1b) or efficiency improvements in the un-core as explanations for these particular results.

My "argument" was literally posting the battery life increase and then the wonderful @Cmaier asking if that was instead due to the networking chip, to which I explained why I thought not, instead using the engineering change of the chip to explain why I thought the battery life increased for the M5 Max chip.

I'm not technically apt enough, nor do I have the resources, nor do I honestly want to at this point spend the effort and energy delving into the nitty gritty of individual performance claims and cross comparing using a chip that isn't even out. To be clear, I was speaking about general performance benchmarks when using M3 Pro situation as context like Geekbench.

My official "position" is this:
1. the new P core is more performant than E, and more efficient than S.
2. This allowed them to change the core configuration from 12 HP/4 HE set up to 6HP/12HE, which combined with the new fusion tech allowed them to 1) boost overall performance (I was using code compilation as one example), 2) boost battery life, and 3) maintain thermal efficiency.
3. N1 chip helps

It is a fact that the M5 Max chip has increased its battery life and performance and core count across both models.

MacBook Pro 14":
20 hours of streaming, and 13 hours of wireless web for M5 Max
18 hours/13 hours for M4 Max

22 hours/14 hours for M5 Pro
22 hours/14 hours for M4 Pro

MacBook Pro 16":
22 hours of streaming, and 16 hours of web for M5 Max
21 hours/14 hours for M4 Max

24 hours/17 hours for M5 Pro
24 hours/17 hours for M4 Pro

13" MacBook Air:
18 hours streaming, 15 hours wireless web for M5
18 hours/15 hours for M4

15" macbook Air:
18 hours streaming, and 15 hours of wireless web for M5
18 hours streaming, and 15 hours of wireless web for M4

Explain to me why you think it's the N1 chip contributing most of the battery life increase and not the new chip, despite the fact that the MBA has the N1 chip and doesn't have any battery increase at all. I'm not the expert.

Fundamentally we don't disagree that changing E-cores to P-cores could and should lead to increases in battery life or at least performance efficiency under load! It absolutely should! All we are saying is that the battery life tests Apple are quoting aren't under load. Thus, this particular set of tests are more likely to be affected by other features of the SOC than the core architecture in and of itself - now it's possible that the un-core elements were able to be rearchitected in part because of the change to P-cores and fewer S-cores - e.g. changes in the cache and fabric becoming more efficient - but that's more of a second order effect. The primary change would be the un-core.

Now what's weird to me is that the 14" and 16" M5 Maxes have such differing increases in battery life between streaming and web compared to their M4 counterparts. The streaming test gets 0 uplift for the 14" M5 Max, while, as a percentage the 16" M5 Max streaming improvement is much better than the same machine's web test. Meanwhile 14" gets much better web uplift as a percentage than the 16". Very odd. Also slightly odd that the M5 Max shows differences compared to the M4 Max but the M4/5 Pro are reported as being identical.
 
Last edited:
My guess is the same. What's more, due to how Apple's memory hierarchy work, the memory controllers will very likely be close to the SLC cache. It would ever interesting to see whether the memory access latency has changed for the CPU. I am also curious on whether it's 2D die arrangement or a 2.5D/3D.
Would each dies (i.e. CPU and GPU) each have their own memory controllers with their respective SLC, L2 and L1 caches, and the SLC then have to coherently syncrhonise both dies' SLCs state via the Fusion interconnect? That is what the UltraFusion does for the Mx Ultras if I'm not wrong.
 
Would each dies (i.e. CPU and GPU) each have their own memory controllers with their respective SLC, L2 and L1 caches, and the SLC then have to coherently syncrhonise both dies' SLCs state via the Fusion interconnect? That is what the UltraFusion does for the Mx Ultras if I'm not wrong.
Processors have their own cache controllers, but those are different from memory controllers. Each CPU, in fact, needs a cache controllers (for the non-shared caches).
 
My guess is the same. What's more, due to how Apple's memory hierarchy work, the memory controllers will very likely be close to the SLC cache. It would ever interesting to see whether the memory access latency has changed for the CPU. I am also curious on whether it's 2D die arrangement or a 2.5D/3D.
I believe it'll be 2D, or 2.5D if you count interconnect bridge die as .5D worth of stacking. Logic on logic stacking is problematic due to thermal issues, so I discount that. And if Apple 3D stacked a memory (SRAM) die, they ought to have talked about it in the "Newsroom" PR.

A thought which has occurred to me is that despite everything changing, nothing huge has changed relative to M4 Pro and Max. We're still looking at three CPU clusters, 20/32(binned)/40 GPU cores, 256/384(binned)/512-bit wide memory, and about the same I/O. Everything's upgraded, but mostly incrementally.

I'd venture a guess that this generation was deliberately designed that way to manage risk, so that even at a relatively late stage in development they could just flip M5 Pro and Max back to monolithic designs. Designers' jobs would be to make sure it all works with a monolithic floorplan, and also with that floorplan cleaved in two and a Fusion interconnect inserted in the middle.

Why would they be risk averse even though they've proved 'Fusion' interconnect reasonably well? Well, we the public don't know what the packaging yields and costs were in any generation of Ultra. But we can observe that Ultras went only into the kind of low volume, high price computers where Apple might well have been eating higher costs to gain experience and knowledge. When setting out to push Fusion lower in the product stack, especially with a new generation of TSMC advanced packaging in the mix, they might have wanted an escape hatch.

So it'll be interesting to see what they do in M6. We could see bigger changes once Fusion is deemed safe enough to design around in high volume products.

All speculation, of course.
 
Would each dies (i.e. CPU and GPU) each have their own memory controllers with their respective SLC, L2 and L1 caches, and the SLC then have to coherently syncrhonise both dies' SLCs state via the Fusion interconnect? That is what the UltraFusion does for the Mx Ultras if I'm not wrong.
In Apple SoCs the "SLC" is memory-side cache. It's evenly divided into slices, with one slice per memory controller. Each slice only caches the contents of its local memory controller's DRAM. This should mean there's no need for SLC-to-SLC coherency traffic.

That said, SLCs do need to participate in coherency. Whenever a bus agent sends out a read request, it has to be filled by whomever has the current canonical copy of the data. That could be a CPU L1 or L2, a GPU cache (if present, but it seems likely even Apple GPUs have some cache of their own), the SLC slice dedicated to that chunk of DRAM, or the DRAM itself.
 
In Apple SoCs the "SLC" is memory-side cache. It's evenly divided into slices, with one slice per memory controller. Each slice only caches the contents of its local memory controller's DRAM. This should mean there's no need for SLC-to-SLC coherency traffic.

That said, SLCs do need to participate in coherency. Whenever a bus agent sends out a read request, it has to be filled by whomever has the current canonical copy of the data. That could be a CPU L1 or L2, a GPU cache (if present, but it seems likely even Apple GPUs have some cache of their own), the SLC slice dedicated to that chunk of DRAM, or the DRAM itself.
So macOS should have the necessary logic to load GPU workload memory to the GPU side of the memory to make it more efficient. That means macOS APIs would have to have workload specific flags for developers to set when loading data into memory.
 
Back
Top