M5 Pro and Max unveiled

You have 7 new instructions coming in, and potentially more than a hundred still-executing instructions in various stages of completion (I don’t know how many pipeline stages they have), and any of those 7 incoming instructions can have an operand that isn’t knowable until one of those still-executing instructions completes - it’s a lot to keep track of, and you need big CAMs with a ton of I/O ports to keep track of it and figure out how to rename registers or defer issue. It was bad enough when I had to do it for Sun and we had only 3+ incoming instructions.
 
You have 7 new instructions coming in, and potentially more than a hundred still-executing instructions in various stages of completion (I don’t know how many pipeline stages they have), and any of those 7 incoming instructions can have an operand that isn’t knowable until one of those still-executing instructions completes - it’s a lot to keep track of, and you need big CAMs with a ton of I/O ports to keep track of it and figure out how to rename registers or defer issue. It was bad enough when I had to do it for Sun and we had only 3+ incoming instructions.
I think the SP (I'll split the difference!) cores are up to what 10-wide or something ridiculous?
 
yeah, something like that (which also may account for the alleged 70% speed for the M vs P). But 7-wide in your second-class core is amazing, especially since they apparently kept the power way down.
I think the E-cores are 6-wide now! It's incredible what they've done with lower cores.
 
I still don’t care about any AI stuff, but I’m glad they are improving general compute with #’s 4 and 5.
I care about the "AI" stuff because it is directly relevant to the whole PCC fake drama from Gurman and others. In other words, Siri lol

Decreased TTFT (Time To First Token processing) , especially by 5X relative to M3, changes the equation from "impossible" to "possible" for PCC using highest end M5.

If prompt processing on highest end M3 is 250 token/s, and highest end M5 is 5X this, then it turns from 250 into 1250 token/s for 1 chip itself, let alone 8 chips as the PCC ensemble documentation says. This is pretty massive and a massive leap.

Generally speaking the more you add to a cluster, the faster it is: 4X the Macs, 3.25 or higher X the performance for prompt processing. Even if it decreased to 70% of the max possible TTFT, 8 Mac ensemble will be 5.6X faster: 7,000 tokens each second.
This data ("3.25X" for 4) is sourced from Awni Hannun.

The average Siri request barring stuff like web searching will likely be 10,000 tokens, many of them way smaller. This means even for more complex tasks, the processing time (TTFT ) will be less than 1.5 second to convert all your data into format usable for TM inference (tokenization). This doesn't include KV cache, which speeds up follow up requests too.

1.5 seconds vs 7 seconds to process 1 request for Siri before beginning to respond to the user, hypothetically. This doesn't include the time it takes to retrieve info from the semantic index, upload info, etc. But you can see the previous speed was likely a non-starter for Apple. It's now a starter. This was always known, by the way. So anyone claiming surprise is likely trying to spin some narrative about them not knowing what they're doing. They know what they're doing, people.

TTFT was the main weak spot of M chips. This is sufficiently addressed (to understate) to entertain the idea they can use PCC and not resort to using TPUs.

You also didn't mention MIE, which isn't "AI." Memory Integrity Enforcement! I think this is super cool for a Mac too! :)
 

View attachment 38245


Some potentially cool info here if accurate

Very interesting! The clocks on M-cores are quite high. We also take a hit in CPU cache sizes. I’m surprised that they say the power consumption would increase.

I think the SP (I'll split the difference!) cores are up to what 10-wide or something ridiculous?

I’ve seen mentions of 10-wide decode for M4, it’s possible that M5 is wider.
 
What do you mean by this?
Nothing, sorry. I crossed it out in my original post. It was based on something I'd read that I have no confidence in now.

However... if this leak about 7-wide for the new P core is accurate, that turns out to be true anyway: AMD's smaller ("c") cores are the same architecturally as their regular cores, just not laid out for high clocks. Intel's small ("E") cores are architecturally smaller. So in this sense only Apple's choice aligns with Intel's.

So under the new naming, M4 Max would be 4P- and 12S, and the M5 Max is 12P and 6S.
No, the M4 Max is 12S/4E, while the M5 Max is 6S/12P.

I guess "performance" sounds better than "mid" :)
I'm not defending this choice, but I think Qcom did the same thing, as have a number of other vendors, who labelled their top tier "Premium" or something like that, and their mid-tier "Performance".

That’s not a new thing, is it? I thought that each port had a dedicated controller since M1. At least my M3 max seems to.
No, it's not new. You can see the individual TB controllers on the published die shots.
 
That’s not a new thing, is it? I thought that each port had a dedicated controller since M1. At least my M3 max seems to.
Yes but I wasn't sure. I just made it to point out just in case. They mentioned it as a feature (as they should since most computers don't even use TB let alone dedicated controllers lol).
Like the highest end M chip has dedicated controllers for all TB5 ports right? But I'm not 100% of previous Pro chips, for example. I thought so but it's worth saying again for sure
 
The SSD is among the fastest, if not the fastest built-in, in the world for consumers at 14.5 GB/s for the 8TB model. This is crazy and amazing! There are external SSDs, but I don't think any pre-installed SSDs are 14.5 GB/s for any other consumer notebook!
 
I'm not defending this choice, but I think Qcom did the same thing, as have a number of other vendors, who labelled their top tier "Premium" or something like that, and their mid-tier "Performance".

Yep, it's a marketing issue. As @Cmaier said before, keep "performance cores" to mean the same thing as before, and suddenly your new chip looks like a downgrade. But rename them to "super core" and "keep" 12 performance cores, and to an uninformed customer this looks like a big improvement. It's dumb, but it makes sense unfortunately.

My only gripe is the naming. It should be "max" and "pro" cores.
 
Yep, it's a marketing issue. As @Cmaier said before, keep "performance cores" to mean the same thing as before, and suddenly your new chip looks like a downgrade. But rename them to "super core" and "keep" 12 performance cores, and to an uninformed customer this looks like a big improvement. It's dumb, but it makes sense unfortunately.

My only gripe is the naming. It should be "max" and "pro" cores.
I mean, is it a marketing issue? Again, I discussed this before @Cmaier and I say this because of what I specifically said, but before that:

The M3 Pro had a downgrade in CPU core ratios. They didn't adjust the names. This isn't a naming thing. I also don't understand the preoccupation on the names rather than the tech

So anyways, back to what I said earlier:
M5 Max has 6 super cores (previously known as performance), with 12 performance (previously known as the lower end, high P/w core).

Translation:

The core config has changed from 12/4 to 6/12. This yields a 15% increase in general performance for CPU while maintaining great thermal efficiency and battery life. If they purely just downgraded to 6/12 from 12/6, it would likely be the same overall performance multi threaded too. But it isn't. The new P core boosts performance overall to make up for 6 fewer S cores (previously known as performance)

When it changes again in 2nm, I suspect S cores will increase due to 2nm and these brand new P cores.

This will create breakthrough performance increases in general for the products that need the absolute most performance while not needing a brand new electrical circuit to run it (cough, NVIDIA).
 
Yep, it's a marketing issue. As @Cmaier said before, keep "performance cores" to mean the same thing as before, and suddenly your new chip looks like a downgrade. But rename them to "super core" and "keep" 12 performance cores, and to an uninformed customer this looks like a big improvement. It's dumb, but it makes sense unfortunately.

My only gripe is the naming. It should be "max" and "pro" cores.
To me it’s a fascinating architectural question. For some workloads, I can imagine that the M4 Max-type core allotment would be better. But laptops are heat-limited, so…

This is why Ultra fascinates me. I can imagine them going quite a different direction on Ultra, if they choose to make a new CPU die.
 
I mean, is it a marketing issue? Again, I discussed this before @Cmaier and I say this because of what I specifically said, but before that:

The M3 Pro had a downgrade in CPU core ratios. They didn't adjust the names. This isn't a naming thing. I also don't understand the preoccupation on the names rather than the tech

So anyways, back to what I said earlier:
Eh. M3 Pro is a well-known outlier. And people who are buying Pro are already willing to sacrifice performance (for cost, most likely). “Max” purchasers are generally looking to maximize performance at any cost, and core count is an unpleasantly attractive way to try to compare two CPUs. Seems pretty clear to me that Apple prefers people not to make direct comparisons in core counts between M4 Max and M5 Max (particularly people who aren’t willing to look under the hood to see what’s going on).
 
Eh. M3 Pro is a well-known outlier. And people who are buying Pro are already willing to sacrifice performance (for cost, most likely). “Max” purchasers are generally looking to maximize performance at any cost, and core count is an unpleasantly attractive way to try to compare two CPUs. Seems pretty clear to me that Apple prefers people not to make direct comparisons in core counts between M4 Max and M5 Max (particularly people who aren’t willing to look under the hood to see what’s going on).
It doesn't change the fact that if it were purely a naming thing to cover the different core config that they could have done the same naming scheme with M3 Pro. They also didn't compare the M3 vs M2 reduction.

The difference now is that there is a legitimate new core, P, which as I described is what will enable a lot more performance with high efficiency for the more powerful products that need it.

Otherwise you're constantly drawing up E cores using more and more wattage, which affects the products that rely on high efficiency the most.

The more interesting thing to discuss is what this might mean for future A chips!
 
To me it’s a fascinating architectural question.

Absolutely. If the "70% of the big core's performance" figure mentioned previously is accurate, this is a transition from a 100% + 100% + 15% to a 100% + 70% + 70% configuration. For multi-core workloads, this could be a win. This kind of partitioning should also work particularly well with Apple's cluster architecture — the six cores share the L2 and can work on a problem collaboratively. In fact, it also might be a decent architecture for an Ultra (and it aligns well with the time-tested server CPU model).

I hope the new CPU architecture manual will be released soon and we can read all about these new cores.


It doesn't change the fact that if it were purely a naming thing to cover the different core config that they could have done the same naming scheme with M3 Pro. They also didn't compare the M3 vs M2 reduction.

The difference now is that there is a legitimate new core, P, which as I described is what will enable a lot more performance with high efficiency for the more powerful products that need it.

The rationale for the new cores is perfectly understood. The conversation about the naming is not because of the new core, but because Apple has changed what the previous names meant. The types of cores Apple introduced here are traditionally known as "M (middle)" cores or similar. So yeah, that conversation is entirely marketing. Again, I think everyone here understand what they are trying to solve by changing the names. It's just a funny thing to do from an enthusiast perspective. It's like you know, you have perfectly fine butter, and then you start selling a somewhat inferior butter and then rename your original butter to "super-butter" and call the new inferior product "butter". Yeah, I know, this is not a good comparison at all, because the new mid-range cores are not inferior at all — they just solve a different problem. But that is a kind of reaction many people will have.

Apple's marketing might have messed up a bit by not using the term "super cores" already when A19 was introduced. Then at least the change would have been consistent. I think this shows that the decision was taken rather late.
 
core, but because Apple has changed what the previous names meant
It changed *one* name, and for good reason

So yeah, that conversation is entirely marketing
i don't know which conversation, but not mine, because I've been focused on what it means technically for users rather than naming. Not to mention the whole "industry uses performance and efficiency nomenclature so Apple is just causing confusion etc etc etc," which is wrong, as I explained which Arm's designs, which use 3 tiers on a chip and offers 4 tiers in general.

Apple's marketing might have messed up a bit by not using the term "super cores" already when A19 was introduced
I argue they didn't. If they did, it would have hinted at where they were going. Very un-Apple. Again, zero leaks whatsoever happened regarding this as far as I can tell, which means competitors learn the direction at the same time consumers do, rather than before.

think this shows that the decision was taken rather late.

What's really interesting about this framing is that it places more weight on rumors than official announcements and engineering. I'd love for you to explain more about what you are thinking!
 
Apologies if this has already been mentioned and I missed it. Do we think there is any advantage in terms of Performance Per Area from having 6 Super + 12 Performance cores vs 12 Performance (now Super) cores and 6 efficiency cores, as we did previously?
 
I care about the "AI" stuff because it is directly relevant to the whole PCC fake drama from Gurman and others. In other words, Siri lol

Decreased TTFT (Time To First Token processing) , especially by 5X relative to M3, changes the equation from "impossible" to "possible" for PCC using highest end M5.

If prompt processing on highest end M3 is 250 token/s, and highest end M5 is 5X this, then it turns from 250 into 1250 token/s for 1 chip itself, let alone 8 chips as the PCC ensemble documentation says. This is pretty massive and a massive leap.

Generally speaking the more you add to a cluster, the faster it is: 4X the Macs, 3.25 or higher X the performance for prompt processing. Even if it decreased to 70% of the max possible TTFT, 8 Mac ensemble will be 5.6X faster: 7,000 tokens each second.
This data ("3.25X" for 4) is sourced from Awni Hannun.

The average Siri request barring stuff like web searching will likely be 10,000 tokens, many of them way smaller. This means even for more complex tasks, the processing time (TTFT ) will be less than 1.5 second to convert all your data into format usable for TM inference (tokenization). This doesn't include KV cache, which speeds up follow up requests too.

1.5 seconds vs 7 seconds to process 1 request for Siri before beginning to respond to the user, hypothetically. This doesn't include the time it takes to retrieve info from the semantic index, upload info, etc. But you can see the previous speed was likely a non-starter for Apple. It's now a starter. This was always known, by the way. So anyone claiming surprise is likely trying to spin some narrative about them not knowing what they're doing. They know what they're doing, people.

TTFT was the main weak spot of M chips. This is sufficiently addressed (to understate) to entertain the idea they can use PCC and not resort to using TPUs.

You also didn't mention MIE, which isn't "AI." Memory Integrity Enforcement! I think this is super cool for a Mac too! :)
Not that you're required to respond, and I'd like anyone else to add to this, but I did mention you specifically and ask for your thoughts specifically @Cmaier so I am curious if this framing helps explain why it matters even if you don't personally make use of TMs (I don't either).
 
Back
Top