M4 Mac Announcements

According to Apple, 512 GB is big enough for their representative 600B parameter model. And I assume this means just big enough, i.e., if 700B would fit, they would have said that instead.

Currently quantizing parameters at 6 bits results in a model that is very close to the original model. 512GB could quite easily fit 512 billion * 8/6 = 682 billion parameters, with the upper limit being somewhere between that and a cool trillion parameters.

However that doesn’t account for the memory required to run the model and preserve the context of whatever inputs it has processed. So once you start bumping that 600B number up, it starts to sound like tech bro bullshit.

Plus, currently, there aren’t any dense models that break 600B outside of data centers, and sparse (e.g. mixture of experts) models have or will soon have inference tricks to load the most likely next parts into RAM while it’s cranking out the current tokens (e.g. certain kinds of speculative decoding).

Anyway, I’m also disappointed that the new Ultra is the M3 generation, but 512GB RAM is *truly* awesome for those in ML. And, for those who aren’t, the M3 generation did add ray tracing among other stuff, so it’s probably a pretty compelling upgrade for graphics professionals, not to mention that ML has set roots down in such fields as well.
 
Seems like this may be a focus for them. The devil’s in the detail, but that’s a big jump.
View attachment 34087
Unfortunately, that is a marketing cheat. they are using models too big to fit in the M1 Ultra and thus end up hitting swap on that machine. For models that can fit into memory the token generation is nearly identical since it’s memory bandwidth bound. Prompt processing and other compute tasks are the biggest problem for Macs in machine learning and the M3 Max is better than the M1 Max there, but not by much.

Frankly, not going M4 Ultra also doesn’t hurt that much there because the M4 GPU compute is only a little better than M3 GPU compute. The extra memory bandwidth would have been nice and I wish they had put faster memory on the M3 Ultra. but currently, Macs or a painful compromise because while 128 to 512 GB RAM brings joy using large prompts exposes the compute weakness.

Apple needs at the very minimum more than 2x the GPU compute and better if they can get as close to 5x as they can. The NPU does basic image and audio basic ML processing, it is not yet designed for the big tasks like LLMs and image and video generation. I don’t know if it means putting matrix or tensor cores into the GPU or a major enhancement to the ALU pipelines themselves, but I am hoping the M5 generation is a massive leap in compute resources.
 
This is a really good point. There were M3 Apple chip IDs that never got used, though I think it’ll be 33 and 34 right? Because there were two Maxes in the end?

So yeah we may really be looking at the release of M3 Ultras ... huh ... weird. They may not call them M3s but ... that's the family of chips they may belong to ...

Small correction, @exoticspice1 was right 32 and 33 are indeed the missing identifiers, 34 was the 384b M3 Max for some reason and @Altaic says that 33 disappeared? So not sure if both are the M3 Ultra or just the 32.
BTW, here's my current active CPID list:
Code:
CPID 0x6000  # M1 Pro
CPID 0x6001  # M1 Max
CPID 0x6002  # M1 Ultra

CPID 0x6010  # ?

CPID 0x6020  # M2 Pro
CPID 0x6021  # M2 Max
CPID 0x6022  # M2 Ultra

CPID 0x6030  # M3 Pro
CPID 0x6031  # M3 Max (512b)
CPID 0x6032  # M3 Ultra (not mass-produced)
CPID 0x6033  # MYSTERY (CPID disappeared)
CPID 0x6034  # M3 Max (384b)

CPID 0x6040  # M4 Pro
CPID 0x6041  # M4 Max (384b & 512b)

CPID 0x6050  # M5 Pro


CPID 0x8101  # A14 Bionic
CPID 0x8103  # M1
CPID 0x8110  # A15 Bionic
CPID 0x8112  # M2
CPID 0x8120  # A16 Bionic
CPID 0x8122  # M3
CPID 0x8130  # A17 Pro
CPID 0x8132  # M4
CPID 0x8140  # A18 (& A18 Pro?)
CPID 0x8142  # M5
CPID 0x8150  # A19
CPID 0x8152  # M6
CPID 0x8160  # A20
 
M4 Ultra Geekbench CPU score. Probably will do better in Cinebench 2024 Multi-Core, but oof, a tough sell when the M4 Max matches this in this test.
1741317162281.png
 
M4 Ultra Geekbench CPU score. Probably will do better in Cinebench 2024 Multi-Core, but oof, a tough sell when the M4 Max matches this in this test.
View attachment 34105
Alas, the same 4.05 GHz clocks as on the M3 Max, so no bump there.
I guess these things aren’t for CPU workloads.
If this is real, it's possible they rushed to post it, and thus didn't wait for Spotlight indexing to finish.
 
M4 Ultra Geekbench CPU score. Probably will do better in Cinebench 2024 Multi-Core, but oof, a tough sell when the M4 Max matches this in this test.
View attachment 34105

M4 Ultra Geekbench Compute score.
View attachment 34106
Small typo: M3 Ultra :)

Good to have confirmation about 15,14. Wasn’t expecting that so soon.
 
M4 Ultra Geekbench Compute score.
View attachment 34106
If that's real, that's excellent scaling, since it's 1.81x the M3 Max. By comparision, the M2 Ultra did not scale as well, being only 1.57x the M2 Max.

It would also put an Apple Silicon GPU on top of GB's Metal ranking for the first time. Though that's not saying much, as the AMD cards are from 2020 or before, since (I assume) they're limited to those that could run on the Intel Mac Pro.
1741319739092.png
 
Last edited:
Maybe. I guess could also be for more parallel workloads? Others have said GB6 doesn’t scale (by design) as well as other tests. We’ll see with Cinebench, or GB5 etc.
Yes, GB6 deliberately changed from using embarrassingly parallel multi-core tests to a mix. Some tests still scale perfectly, others have thread synchronization points which prevent this.


If you look at the individual test scores, GB6 includes a raytracer benchmark, which should behave similarly to CB24. In single thread, M4 Max beats M3 Ultra on this one by a factor of 1.17x, but in multicore that flips to M3 Ultra winning by 1.6x. Some probably naive napkin math suggests that's not quite perfect scaling on the M3 Ultra's part, but still,
if you do lots of embarrassingly parallel CPU compute, M3 Ultra should be the highest performing Mac. (Unless your code can make use of SME, in which case that probably flips back to M4 Max.)
 
Lol. Confirmed (from json) 80 core M3 Ultra GPU.
View attachment 34109
Hmmm … given the other score, one of them has to be wrong/done at low power/during indexing. GB 6 typically varies by 10-15% not by over 30%.

What do we think about the idea that the perfect customer for the M3 Ultra is Apple? This is a chip for PCC (Private Cloud Compute).

Maybe?
 
What do we think about the idea that the perfect customer for the M3 Ultra is Apple? This is a chip for PCC (Private Cloud Compute).
I was thinking the same. I think the M3 Ultra is here, because of Baltra. I think the Baltra chip is the result of Apple confronting what they really need in a PCC chip, but it is not coming out until 2026, far away. Too far away to endure the limitations of the M2 Ultra cloud.

I also don’t think the delay of “conversational” Siri until 2026 is a coincidence. The new Siri needs Baltra I’m thinking.
 
Anyone know the model no. of the M4 Max Studio (i.e., ##,#), so I could search for it in GB? I don't expect they gave it a higher max clock than the M4 Max MBP, but would like to confirm.
 
Back
Top