Oh?7-wide decoding always tickles me. (And makes me glad I didn’t have to design that scheduler)
Oh?7-wide decoding always tickles me. (And makes me glad I didn’t have to design that scheduler)
You have 7 new instructions coming in, and potentially more than a hundred still-executing instructions in various stages of completion (I don’t know how many pipeline stages they have), and any of those 7 incoming instructions can have an operand that isn’t knowable until one of those still-executing instructions completes - it’s a lot to keep track of, and you need big CAMs with a ton of I/O ports to keep track of it and figure out how to rename registers or defer issue. It was bad enough when I had to do it for Sun and we had only 3+ incoming instructions.
I think the SP (I'll split the difference!) cores are up to what 10-wide or something ridiculous?You have 7 new instructions coming in, and potentially more than a hundred still-executing instructions in various stages of completion (I don’t know how many pipeline stages they have), and any of those 7 incoming instructions can have an operand that isn’t knowable until one of those still-executing instructions completes - it’s a lot to keep track of, and you need big CAMs with a ton of I/O ports to keep track of it and figure out how to rename registers or defer issue. It was bad enough when I had to do it for Sun and we had only 3+ incoming instructions.
yeah, something like that (which also may account for the alleged 70% speed for the M vs P). But 7-wide in your second-class core is amazing, especially since they apparently kept the power way down.I think the SP (I'll split the difference!) cores are up to what 10-wide or something ridiculous?
I think the E-cores are 6-wide now! It's incredible what they've done with lower cores.yeah, something like that (which also may account for the alleged 70% speed for the M vs P). But 7-wide in your second-class core is amazing, especially since they apparently kept the power way down.
I care about the "AI" stuff because it is directly relevant to the whole PCC fake drama from Gurman and others. In other words, Siri lolI still don’t care about any AI stuff, but I’m glad they are improving general compute with #’s 4 and 5.
![]()
M5 Pro and Max Specs Leaked
M4 Pro performance cores 4.51GHz,efficiency cores 2.59GHz ,M5 Pro super cores 4.61GHz, performance cores 4.38GHzforums.macrumors.com
View attachment 38245
Some potentially cool info here if accurate
I think the SP (I'll split the difference!) cores are up to what 10-wide or something ridiculous?
5. Thunderbolt 5 ports on all Pro and Max chips have their own dedicated controller now for the M5.
Nothing, sorry. I crossed it out in my original post. It was based on something I'd read that I have no confidence in now.What do you mean by this?
No, the M4 Max is 12S/4E, while the M5 Max is 6S/12P.So under the new naming, M4 Max would be 4P- and 12S, and the M5 Max is 12P and 6S.
I'm not defending this choice, but I think Qcom did the same thing, as have a number of other vendors, who labelled their top tier "Premium" or something like that, and their mid-tier "Performance".I guess "performance" sounds better than "mid"![]()
No, it's not new. You can see the individual TB controllers on the published die shots.That’s not a new thing, is it? I thought that each port had a dedicated controller since M1. At least my M3 max seems to.
Yes but I wasn't sure. I just made it to point out just in case. They mentioned it as a feature (as they should since most computers don't even use TB let alone dedicated controllers lol).That’s not a new thing, is it? I thought that each port had a dedicated controller since M1. At least my M3 max seems to.
I'm not defending this choice, but I think Qcom did the same thing, as have a number of other vendors, who labelled their top tier "Premium" or something like that, and their mid-tier "Performance".
I mean, is it a marketing issue? Again, I discussed this before @Cmaier and I say this because of what I specifically said, but before that:Yep, it's a marketing issue. As @Cmaier said before, keep "performance cores" to mean the same thing as before, and suddenly your new chip looks like a downgrade. But rename them to "super core" and "keep" 12 performance cores, and to an uninformed customer this looks like a big improvement. It's dumb, but it makes sense unfortunately.
My only gripe is the naming. It should be "max" and "pro" cores.
M5 Max has 6 super cores (previously known as performance), with 12 performance (previously known as the lower end, high P/w core).
Translation:
The core config has changed from 12/4 to 6/12. This yields a 15% increase in general performance for CPU while maintaining great thermal efficiency and battery life. If they purely just downgraded to 6/12 from 12/6, it would likely be the same overall performance multi threaded too. But it isn't. The new P core boosts performance overall to make up for 6 fewer S cores (previously known as performance)
When it changes again in 2nm, I suspect S cores will increase due to 2nm and these brand new P cores.
This will create breakthrough performance increases in general for the products that need the absolute most performance while not needing a brand new electrical circuit to run it (cough, NVIDIA).
To me it’s a fascinating architectural question. For some workloads, I can imagine that the M4 Max-type core allotment would be better. But laptops are heat-limited, so…Yep, it's a marketing issue. As @Cmaier said before, keep "performance cores" to mean the same thing as before, and suddenly your new chip looks like a downgrade. But rename them to "super core" and "keep" 12 performance cores, and to an uninformed customer this looks like a big improvement. It's dumb, but it makes sense unfortunately.
My only gripe is the naming. It should be "max" and "pro" cores.
Eh. M3 Pro is a well-known outlier. And people who are buying Pro are already willing to sacrifice performance (for cost, most likely). “Max” purchasers are generally looking to maximize performance at any cost, and core count is an unpleasantly attractive way to try to compare two CPUs. Seems pretty clear to me that Apple prefers people not to make direct comparisons in core counts between M4 Max and M5 Max (particularly people who aren’t willing to look under the hood to see what’s going on).I mean, is it a marketing issue? Again, I discussed this before @Cmaier and I say this because of what I specifically said, but before that:
The M3 Pro had a downgrade in CPU core ratios. They didn't adjust the names. This isn't a naming thing. I also don't understand the preoccupation on the names rather than the tech
So anyways, back to what I said earlier:
It doesn't change the fact that if it were purely a naming thing to cover the different core config that they could have done the same naming scheme with M3 Pro. They also didn't compare the M3 vs M2 reduction.Eh. M3 Pro is a well-known outlier. And people who are buying Pro are already willing to sacrifice performance (for cost, most likely). “Max” purchasers are generally looking to maximize performance at any cost, and core count is an unpleasantly attractive way to try to compare two CPUs. Seems pretty clear to me that Apple prefers people not to make direct comparisons in core counts between M4 Max and M5 Max (particularly people who aren’t willing to look under the hood to see what’s going on).
To me it’s a fascinating architectural question.
It doesn't change the fact that if it were purely a naming thing to cover the different core config that they could have done the same naming scheme with M3 Pro. They also didn't compare the M3 vs M2 reduction.
The difference now is that there is a legitimate new core, P, which as I described is what will enable a lot more performance with high efficiency for the more powerful products that need it.
It changed *one* name, and for good reasoncore, but because Apple has changed what the previous names meant
i don't know which conversation, but not mine, because I've been focused on what it means technically for users rather than naming. Not to mention the whole "industry uses performance and efficiency nomenclature so Apple is just causing confusion etc etc etc," which is wrong, as I explained which Arm's designs, which use 3 tiers on a chip and offers 4 tiers in general.So yeah, that conversation is entirely marketing
I argue they didn't. If they did, it would have hinted at where they were going. Very un-Apple. Again, zero leaks whatsoever happened regarding this as far as I can tell, which means competitors learn the direction at the same time consumers do, rather than before.Apple's marketing might have messed up a bit by not using the term "super cores" already when A19 was introduced
think this shows that the decision was taken rather late.
Not that you're required to respond, and I'd like anyone else to add to this, but I did mention you specifically and ask for your thoughts specifically @Cmaier so I am curious if this framing helps explain why it matters even if you don't personally make use of TMs (I don't either).I care about the "AI" stuff because it is directly relevant to the whole PCC fake drama from Gurman and others. In other words, Siri lol
Decreased TTFT (Time To First Token processing) , especially by 5X relative to M3, changes the equation from "impossible" to "possible" for PCC using highest end M5.
If prompt processing on highest end M3 is 250 token/s, and highest end M5 is 5X this, then it turns from 250 into 1250 token/s for 1 chip itself, let alone 8 chips as the PCC ensemble documentation says. This is pretty massive and a massive leap.
Generally speaking the more you add to a cluster, the faster it is: 4X the Macs, 3.25 or higher X the performance for prompt processing. Even if it decreased to 70% of the max possible TTFT, 8 Mac ensemble will be 5.6X faster: 7,000 tokens each second.
This data ("3.25X" for 4) is sourced from Awni Hannun.
The average Siri request barring stuff like web searching will likely be 10,000 tokens, many of them way smaller. This means even for more complex tasks, the processing time (TTFT ) will be less than 1.5 second to convert all your data into format usable for TM inference (tokenization). This doesn't include KV cache, which speeds up follow up requests too.
1.5 seconds vs 7 seconds to process 1 request for Siri before beginning to respond to the user, hypothetically. This doesn't include the time it takes to retrieve info from the semantic index, upload info, etc. But you can see the previous speed was likely a non-starter for Apple. It's now a starter. This was always known, by the way. So anyone claiming surprise is likely trying to spin some narrative about them not knowing what they're doing. They know what they're doing, people.
TTFT was the main weak spot of M chips. This is sufficiently addressed (to understate) to entertain the idea they can use PCC and not resort to using TPUs.
You also didn't mention MIE, which isn't "AI." Memory Integrity Enforcement! I think this is super cool for a Mac too!![]()
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.