M5 Pro and Max unveiled

NotEntirelyConfused · Mar 5, 2026

leman said:
My logic is the following — it is much more economical to produce two small dies than one large die due to how defects work. And that could really add up with such an expensive process.

Maynard is here so he can speak for himself, but briefly, that ignores the costs (including yield issues, since the process isn't perfect) of packaging.

Cmaier · Mar 5, 2026

NotEntirelyConfused said:
Maynard is here so he can speak for himself, but briefly, that ignores the costs (including yield issues, since the process isn't perfect) of packaging.

Packaging costs are a lot less than die manufacturing costs. It’s also sort of exponential - double the size of the die, and you more than double likelihood of defects. In fact, it’s usually exponential. Like yield=e^(-kA) where A is the area.

Wafers are very expensive, so if I can cut my die size in half (from reticle limit to ½), my yield increases so substantially that the costs of packaging are tiny in comparison. These packaging technologies are sort of as complex as die manufacturing was 10 or more years ago, so it’s comparatively cheap.

exoticspice1 · Mar 5, 2026

Jimmyjames said:
Clearly I have a reading problem! I can’t see L1 or L2 mentioned.

Where do you think that baidu user got the clocks, width info and llc information from? lol

It’s from the hubweb.cn site which I clearly posted before posting the baidu link.

Jimmyjames · Mar 5, 2026

exoticspice1 said:
Where do you think that baidu user got the clocks, width info and llc information from? lol

It’s from the hubweb.cn site which I clearly posted before posting the baidu link.

Sorry not sure I follow? I’m simply saying I don’t see any mention of L1 or L2 on the linked post. I’m sure it’s my issue as I said.

Edif. Oh I was referring to the last link you posted. I didn’t know there was a previous link.

leman · Mar 5, 2026

exoticspice1 said:
It looks like THERE is a misunderstanding on my part. On Anandtech, a user corrected me and said it’s like this. Looks like I read the Chinese labels wrong, I apologise for the confusion.

it should be like below:
S core: 1 MB of L2 per core + 16MB shared L3
P core: 16 MB shared L2

Thanks, that would make more sense to me. What's interesting is that it's not far off from what we had until now. Each core had a priority fast access to a part of the L2 (I think it was around 2MB), and slightly slower access to the rest of the L2. So in a way each core did have its own "private" L2, but other cores could access it directly. So I wonder whether this new info is more of that, or whether we indeed get a new intermediate level of cache. If it's the latter, it's hard to imagine Apple going back to a classical L1->L2->L3 hierarchy after having had a superior solution for a while (their shared L2 essentially did the same as the traditional L3, but was faster). A new 1MB of very fast cache (more like a L1.5) could be interesting though.

Cmaier · Mar 5, 2026

leman said:
Thanks, that would make more sense to me. What's interesting is that it's not far off from what we had until now. Each core had a priority fast access to a part of the L2 (I think it was around 2MB), and slightly slower access to the rest of the L2. So in a way each core did have its own "private" L2, but other cores could access it directly. So I wonder whether this new info is more of that, or whether we indeed get a new intermediate level of cache. If it's the latter, it's hard to imagine Apple going back to a classical L1->L2->L3 hierarchy after having had a superior solution for a while (their shared L2 essentially did the same as the traditional L3, but was faster). A new 1MB of very fast cache (more like a L1.5) could be interesting though.

yeah, “shared” may just mean “outside the cores.” Physical distribution is different from logical distribution, so hard to interpret any of this.

exoticspice1 · Mar 5, 2026

once we do paste this in terminal we might get a better outlook. we really need die shots to confirm it fully

sysctl -a | grep -E "^hw\\." | grep -v optional

Jimmyjames · Mar 5, 2026

Mac17,7 - Geekbench

Benchmark results for a Mac17,7 with an Apple M5 Max processor.

browser.geekbench.com

Cmaier · Mar 5, 2026

Jimmyjames said:
Mac17,7 - Geekbench

Benchmark results for a Mac17,7 with an Apple M5 Max processor.

browser.geekbench.com

View attachment 38265

Wow!

Jimmyjames · Mar 5, 2026

I did think this might be higher.

Mac17,7 - Geekbench

Benchmark results for a Mac17,7 with an Apple M5 Max processor.

browser.geekbench.com

NotEntirelyConfused · Mar 5, 2026

Geekbench is claiming the Ps are in a single 12-core cluster. That presumably means only 2 SMEs. Though I guess there's nothing stopping them from putting two SMEs in a single cluster.

Jimmyjames · Mar 5, 2026

Better

Mac17,7 - Geekbench

Benchmark results for a Mac17,7 with an Apple M5 Max processor.

browser.geekbench.com

leman · Mar 5, 2026

NotEntirelyConfused said:
Geekbench is claiming the Ps are in a single 12-core cluster. That presumably means only 2 SMEs. Though I guess there's nothing stopping them from putting two SMEs in a single cluster.

I believe they are merely reporting the number of cores of each type. They also report 12 cores in the first cluster for M4 max.

leman · Mar 6, 2026

By the way, if the M5 cores ship with extra cache (which is not present in A19), could it explain the IPC increase we see over the iPhone chips?

leman · Mar 6, 2026

NotEntirelyConfused said:
Maynard is here so he can speak for himself, but briefly, that ignores the costs (including yield issues, since the process isn't perfect) of packaging.

Sorry, I forgot to reply to this. While packaging is obviously not free, making chips using these newer processes is getting extraordinarily expensive. The move to MCM we’ve observed recently is a direct consequence of that. If building monolithic chips were cheaper, Intel wouldn’t bother with their tile architecture. Neither would Apple. Large chips that exceed reticule size are obviously an exception, but none of these chips are that large.

Jimmyjames · Mar 6, 2026

Obviously the total available bandwidth for the M5 Max is 614 GB/s. Does anyone know approximately how much the gpu would be able to use? The total bandwidth must be shared between all parts of the systems. Can the GPU use 400GB/s? 500GB/s? Any ideas?

leman · Mar 6, 2026

Jimmyjames said:
Obviously the total available bandwidth for the M5 Max is 614 GB/s. Does anyone know approximately how much the gpu would be able to use? The total bandwidth must be shared between all parts of the systems. Can the GPU use 400GB/s? 500GB/s? Any ideas?

That would entirely depend on the bandwidth between the GPU cores and the SLC, and also on the work distribution on the GPU cores themselves. I don't think this has been tested out comprehensively yet? We do know that were link limitations between the SLC and the CPU L2 on previous architectures, for example.

Yoused · Mar 6, 2026

Jimmyjames said:
Mac17,7 - Geekbench

Benchmark results for a Mac17,7 with an Apple M5 Max processor.

browser.geekbench.com

I mean, I realize that GB6 is still a pretty short test and not indicative of real-world use, but that MC score is 7% lower than the top MC score for x86 – a Threadripper with 64 HT cores. Granted, there is an enormous difference in clock speed, but still.

Joelist · Mar 6, 2026

Hi!

I admit I find this new design fascinating. Three different core types? Using Fusion to combine different subunits into a single SOC as opposed to the use of it to combine two SOCs we previously saw with Ultra? It does make me wonder a little as performance is getting WAY ahead of the ability of most software to use it all.

leman · Mar 6, 2026

Yoused said:
I mean, I realize that GB6 is still a pretty short test and not indicative of real-world use, but that MC score is 7% lower than the top MC score for x86 – a Threadripper with 64 HT cores. Granted, there is an enormous difference in clock speed, but still.

At the same time GB6 is not a good test for throughout workflows — by design. It is meant as a test of typical user-facing software. Threadripper would still be considerably faster on a parallel number-crunching workload, for example.

M5 Pro and Max unveiled

Power User

Site Master

Site Champ

Elite Member

Elite Member

Site Master

Site Champ

Elite Member

Site Master

Elite Member

Power User

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

up

Power User

Elite Member

Similar threads