M4 Mac Announcements

leman · Oct 29, 2024

M4 Pro with 14 cores? That's interesting. I am curious to learn more, hopefully later today. They either went back to 4-core clusters, or we get 8 E-cores, or something crazy like 8 P-cores in a cluster (which I personally think is unlikely, but would be a total killer).

The Flame · Oct 29, 2024

I am curious if M4 Pro or M4 Max has higher L2 cache per P-core cluster.

M1 Pro M1 Max	4P	12 MB
M2 Pro M2 Max	4P	16 MB
M3 Pro M3Max	6P	16 MB

M3 Pro and M3 Max increased the number of P-cores cores in a cluster by 50%, but the L2 cache size was unchanged.

mr_roboto · Oct 29, 2024

Altaic said:
For sure, though ProMotion is 240Hz max, so you have to break it into throughput

Huh? I can't think of a single ProMotion display that's 240Hz max, Apple's only been doing 120Hz.

mr_roboto · Oct 29, 2024

On cluster size... just a reminder that Apple's design methdology seems to allow them to be flexible on the number of cores per cluster in different family members. There's plenty of precedent:

M1: 4-core E cluster. M1 Pro/Max: 2-core E cluster.
M3: 4-core P cluster. M3 Pro/Max: 6-core P cluster.
M3, M3 Max: 4-core E cluster. M3 Pro: 6-core E cluster.

I believe it's true that in all cases the cluster's shared L2 never changes size even when the number of cores changes.

To me this means we don't have enough to make any solid guesses about what the core counts in M4 Pro mean.

Jimmyjames · Oct 29, 2024

theorist9 said:
It appears the 8k@120 Hz was a typo. Apple just changed it to 8k@60Hz. Credit to Chancha at MR for the comparative screen grabs:

View attachment 32329

Ah ****

dada_dave · Oct 29, 2024

mr_roboto said:
On cluster size... just a reminder that Apple's design methdology seems to allow them to be flexible on the number of cores per cluster in different family members. There's plenty of precedent:

M1: 4-core E cluster. M1 Pro/Max: 2-core E cluster.
M3: 4-core P cluster. M3 Pro/Max: 6-core P cluster.
M3, M3 Max: 4-core E cluster. M3 Pro: 6-core E cluster.

I believe it's true that in all cases the cluster's shared L2 never changes size even when the number of cores changes.

To me this means we don't have enough to make any solid guesses about what the core counts in M4 Pro mean.

Oh absolutely, but having them be the same across these three family members appeals to my sense of consistency.

That’s not necessarily a good reason, but a better one might be that sharing the same cluster design across multiple dies might decrease the overall workload for the Apple engineers. Plus it would actually be necessary if Gurman is right about the Pro being a cut down Max as surely you would want consistency of a P-core cluster size within a die - ie for the Max, Apple probably doesn’t want one 8-P core cluster and one 4-P core cluster. If Gurman is right.

leman said:
M4 Pro with 14 cores? That's interesting. I am curious to learn more, hopefully later today. They either went back to 4-core clusters, or we get 8 E-cores, or something crazy like 8 P-cores in a cluster (which I personally think is unlikely, but would be a total killer).

An 8-P core cluster would be great for inter-thread communication but would also increase resource contention if they don’t increase the L2 per cluster and don’t increase the AMX cores per cluster too.

NotEntirelyConfused · Oct 29, 2024

dada_dave said:
An 8-P core cluster would be great for inter-thread communication but would also increase resource contention if they don’t increase the L2 per cluster and don’t increase the AMX cores per cluster too.

Yes, this is what I was getting at last week. Doing an 8-core cluster would be a significant flex, if it comes with more cluster resources. Can they build a larger cache and still make their timing?? A bigger AMX seems less hard, and significantly beneficial for the relevant workloads.

leman · Oct 29, 2024

NotEntirelyConfused said:
Yes, this is what I was getting at last week. Doing an 8-core cluster would be a significant flex, if it comes with more cluster resources. Can they build a larger cache and still make their timing?? A bigger AMX seems less hard, and significantly beneficial for the relevant workloads.

I'm not sure how a larger AMX/SME unit would work. They can't really make the ALUs themselves larger, since that would break the baseline assumptions (it has to remain a 512x512 bit unit). I suppose they could introduce multiple ALUs that can operate in parallel, but then you need to do data synchronization and movement between then, which sounds tricky and expensive...

For SME performance, 2x clusters of 4 P-cores is probably better.

Note that M4 does have some limited functionality that allows it to combine work from multiple threads. If one thread only uses a part of available SME resources, using multiple threads per cluster will improve performance. However, if one thread uses all available SME resources, using multiple threads does nothing. All this makes SME tricky to use in practice. I will publish a new, updated analysis within the next few days.

dada_dave · Oct 29, 2024

NotEntirelyConfused said:
Yes, this is what I was getting at last week. Doing an 8-core cluster would be a significant flex, if it comes with more cluster resources. Can they build a larger cache and still make their timing?? A bigger AMX seems less hard, and significantly beneficial for the relevant workloads.

leman said:
I'm not sure how a larger AMX/SME unit would work. They can't really make the ALUs themselves larger, since that would break the baseline assumptions (it has to remain a 512x512 bit unit). I suppose they could introduce multiple ALUs that can operate in parallel, but then you need to do data synchronization and movement between then, which sounds tricky and expensive...

I was thinking having simply more than one AMX unit per cluster. That should be theoretically doable no?

leman said:
For SME performance, 2x clusters of 4 P-cores is probably better.

Aye.

leman said:
Note that M4 does have some limited functionality that allows it to combine work from multiple threads. If one thread only uses a part of available SME resources, using multiple threads per cluster will improve performance. However, if one thread uses all available SME resources, using multiple threads does nothing. All this makes SME tricky to use in practice. I will publish a new, updated analysis within the next few days.

Interesting!

leman · Oct 29, 2024

dada_dave said:
I was thinking having simply more than one AMX unit per cluster. That should be theoretically doable no?

I am not sure that is a good idea. SME requires per-thread data storage, and that storage has to be local to the accelerator for performance reasons (direct communication between the CPU and the SME unit is limited to exchanging control information such as addresses and offsets). To make separate SME units work, you'd either need to pin threads to specific units or move data between the units. Both sounds complicated and inefficient.

Multiple matrix ALUs with shared storage per SME unit would work (similar to how CPUs have multiple ports today), no idea how difficult or costly it would be to implement in practice. To my amateur eyes, balancing work between two large units sounds like a lot of extra overhead. But maybe that is Apple's future direction, who knows.

Andropov · Oct 29, 2024

10P+4E for the Mac Mini with M4 Pro

leman · Oct 29, 2024

Five P-cores per cluster, I don't think anyone would have guessed that

Maybe there are actually two clusters with 6 cores each, but one is disabled. We will see how M4 Max looks like and whether it is a chopped die or a fully separate design. If M4 Max has 18 performance cores, that will literally embarrass the rest of the industry.

Cmaier · Oct 29, 2024

Andropov said:
10P+4E for the Mac Mini with M4 Pro

beefy!

dada_dave · Oct 29, 2024

Andropov said:
10P+4E for the Mac Mini with M4 Pro

Whoa. For the … Pro CPU?

leman said:
Five P-cores per cluster, I don't think anyone would have guessed that

Nope! Well okay not me anyway!

leman said:
Maybe there are actually two clusters with 6 cores each, but one is disabled. We will see how M4 Max looks like and whether it is a chopped die or a fully separate design. If M4 Max has 18 performance cores, that will literally embarrass the rest of the industry.

This is nuts! The Max may just go back to having the same CPU count as the Pro (unless as you say the Pro has two CPU cores disabled) but increase GPU.

I have to admit, I had kinda hoped they’d do the opposite for the Pro SOC, fewer P-cores but increase GPU core count by even more.

The Flame · Oct 29, 2024

leman said:
Five P-cores per cluster, I don't think anyone would have guessed that

How do we know it's 5P + 5P? Could be 4P+6P as well.

It's good to see M4 Pro returning the M Pro line back to greatness. M3 Pro didn't look very good, considering how they downgraded the memory bandwidth and the multicore performance was almost identical to M2 Pro.

Based on the 10P+4E configuration of M4 Pro, I estimate it will exceed 1800 points in Cinebench 2024 Multi Core. That would make it 80% faster than M3 Pro, and even faster than M3 Max (1700 points).

And this suggests that M4 Max will be an absolute beast...

They are saving the best for the last! Who else is excited for tommorow's event?

New Macbook Pros + M4 Max chip.

leman · Oct 29, 2024

The Flame said:
How do we know it's 5P + 5P? Could be 4P+6P as well.

Apple never does that. They go for symmetry in design. Having imbalanced clusters like that would mean weird consequences for performance and power consumption.

leman · Oct 29, 2024

By the way, there is something very odd about the M4 Pro RAM bandwidth. The M4 is quoted at 120GB/s, that’s the usual LPDDR5X. But M4 Pro is whopping 273GB/s, more than double! I have difficulty understanding which memory technology is that. ~~If the RAM standard is the same, this would indicate a 320-bit interface with ECC. If it’s still a 192- or 256-bit interface, then it must be some new RAM tech. This is too fast even for LPDDR6.~~

Edit: as pointed out by @The Flame and others, this is most likely LPDDR5X-8553 running on a 256bit memory interface. My mistake!

Jimmyjames · Oct 29, 2024

leman said:
By the way, there is something very odd about the M4 Pro RAM bandwidth. The M4 is quoted at 120GB/s, that’s the usual LPDDR5X. But M4 Pro is whopping 273GB/s, more than double! I have difficulty understanding which memory technology is that. If the RAM standard is the same, this would indicate a 320-bit interface with ECC. If it’s still a 192- or 256-bit interface, then it must be some new RAM tech. This is too fast even for LPDDR6.

Not sure it can be anything good given I’ve been reliably informed soc design is the same as “designing a dominos pizza”.

The Flame · Oct 29, 2024

leman said:
By the way, there is something very odd about the M4 Pro RAM bandwidth. The M4 is quoted at 120GB/s, that’s the usual LPDDR5X. But M4 Pro is whopping 273GB/s, more than double! I have difficulty understanding which memory technology is that. If the RAM standard is the same, this would indicate a 320-bit interface with ECC. If it’s still a 192- or 256-bit interface, then it must be some new RAM tech. This is too fast even for LPDDR6.

Seems pretty obvious to me.

M4 Pro has LPDDR5X-8533 mated to a 256 bit memory bus.

= (8.533 ÷ 8) × 256
= 273 GB/s.

LPDDD5X-8533 has been in the market for quite a while now. For example, Intel's Lunar Lake has LPDDR5X-8533 mated to a 128 bit memory bus, which gives it 136 GB/s of bandwidth.

Citysnaps · Oct 29, 2024

The new M4 Mac Mini looks pretty sweet. I can finally ditch my 2019 Intel Mini that uses 30 - 40 watts 24/7 running my eight outdoor security cameras and my home automation software. Looking forward to a large boost in video processing for the security cams. I just need to figure out the right amount of memory/storage, and deciding on M4 vs M4 Pro chips.

Also thinking about a dedicated M4 Mini for running X-Plane flight simulator with three displays.

M4 Mac Announcements

Elite Member

Power User

Site Champ

Site Champ

Elite Member

Elite Member

Power User

Elite Member

Elite Member

Elite Member

Site Champ

Elite Member

Site Master

Elite Member

Power User

Elite Member

Elite Member

Elite Member

Power User

Elite Member

Similar threads