M4 Mac Announcements

One thing I am curious about: there are charts on which you can compare M-series Macs against nVidia and AMD graphics cards; in OpenCL, the Mac GPU worse than half the score of the dGPUs, but in Metal, the separation is quite a bit closer, the highest Mac being behind the highest card by around 5%. I realize that OpenCL has some serious deficiencies and should not be relied on as a good measure. What I am curious about is whether there are performance/efficiency comparisons between Metal and the other graphics APIs. How does Metal compare to Vulkan, DirectX and OpenCL for the same jobs?
Comparing across APIs when the hardware is different is extremely difficult. For OpenCL/GL on macOS, we do occasionally have the same (AMD) hardware running the other APIs but then the OpenGL/CL implementation on macOS is practically deprecated and/or running through a Metal translation layer anyway. In general though, I’ve tried looking at this using benchmarks that use different APIs for the same task (Geekbench, some 3D Mark ones, Aztec Ruins, etc …) and while I haven’t charted them all out rigorously, I’ve never noted a consistent pattern. From what I gather from people who work in the field, is that none of the modern APIs (DirectX, Vulkan, Metal) are innately, substantially superior in regards to performance, but drivers for the particular hardware matter a lot and basically swamp most things, even competing with or surpassing hardware differences.
 
Some scores for a variety of LLMs being run on M3 Ultra, M3 Max and a 5090
From the review here: https://creativestrategies.com/mac-studio-m3-ultra-ai-workstation-review/

View attachment 34155
I re-read the article (thanks again for linking it) and have some additional thoughts:

(1) In describing the table comparing a 5090 PC to AS Macs, he says "Below is just a quick ballpark of the same prompt, same seed, same model on 3 machines from above. This is all at 128K token context window (or largest supported by the model) and using llama.cpp on the gaming PC and MLX on the Macs....The theoretical performance of an optimized RTX 5090 using the proper Nvidia optimization is far greater than what you see above on Windows, but this again comes down to memory. RTX 5090 has 32GB, M3 Ultra has a minimum of 96GB and a maximum of 512GB. [emphasis his]"

The problem is that, when he presents that table, he doesn't explictly provide the size of the model he's using, so I don't know the extent to which it exceeds the 32 GB RAM on the 5090. I don't understand why tech people omit such obvious stuff in their writing. Well, actually, I do; they're not trained as educators, and thus not trained to ask "If I someone else were reading this, what key info. would they want to know?" OK, rant over. Anyways, can you extract this info from the article?

(2) This is interesting:
"You can actually connect multiple Mac Studios using Thunderbolt 5 (and Apple has dedicated bandwidth for each port as well, so no bottlenecks) for distributed compute using 1TB+ of memory, but we’ll save that for another day."
I've read you can also do this with the Project DIGITS boxes. It would be interesting to see a shootout between an M3 Ultra with 256 GB RAM ($5,600 with 60-core GPU or $7,100 with 80-core GPU) and 2 x DIGITS ($6,000, 256 GB combined VRAM). Or, if you can do 4 x DIGITS, then that ($12,000, 512 GB VRAM) vs. a 512 GB Ultra ($9,500 with 80-core GPU).

(3) And this is surprising:
"...almost every AI developer I know uses a Mac! Essentially, and I am generalizing: Every major lab, every major developer, everyone uses a Mac."
How can that be, given that AI-focused data centers are commonly NVIDIA/CUDA-based. To develop for those, you would (I assume) want to be working on an NVIDIA workstation. Is the fraction of AI developers writing code for data center use really that tiny?
 
Last edited:
The interesting thing here is that in the fanless Air (and presumably iPad), the M4 is indeed constrained in terms of performance, but that actually makes it a much more efficient performer.
Why is that interesting? Isn't it completely expected? Being lower on the curve implies being more efficient.

(3) And this is surprising:
"...almost every AI developer I know uses a Mac! Essentially, and I am generalizing: Every major lab, every major developer, everyone uses a Mac."
How can that be, given that AI-focused data centers are commonly NVIDIA/CUDA-based. To develop for those, you would (I assume) want to be working on an NVIDIA workstation. Is the fraction of AI developers writing code for data center use really that tiny?
I don't have a large sample to observe but I'll bet they all use Mac laptops, remotely accessing AI servers over ssh (or possibly RDC?).
 
Why is that interesting? Isn't it completely expected? Being lower on the curve implies being more efficient.
I was surprised by the amount of efficiency gained. That implies Apple is pushing the base M4 (and potentially the others) much further on its curve than I had thought - that it was pushed further than the base M3 was obvious but it wasn’t clear what the shape of that curve was until now, especially with the two added E-cores and change in architecture. In brief, it was the “much more efficient performer” that was interesting (to me).
 
I don't have a large sample to observe but I'll bet they all use Mac laptops, remotely accessing AI servers over ssh (or possibly RDC?).
I was referring to code development rather than access—that if they want to develop code locally for use on an NVIDIA-based AI server, they'd want to write their code on an NVIDIA-based workstation.

Or are you saying most who develop server-based AI models only use their personal computers to access the server, and do their development work on the server system itself (on, say, dedicated development nodes that are firewalled from the production nodes)? That's also possible but, in that case, the Mac's AI capabilities become irrelevant, which wouldn't make sense within the context of the article (the article's author was saying the high percentage of Mac users among AI developers indicates how suitable the Mac is for AI development work).
 
Last edited:
Or are you saying most who develop server-based AI models only use their personal computers to access the server, and do their development work on the server system itself (on, say, dedicated development nodes that are firewalled from the production nodes)? That's also possible but, in that case, the Mac's AI capabilities become irrelevant, which wouldn't make sense within the context of the article (the article's author was saying the high percentage of Mac users among AI developers indicates how suitable the Mac is for AI development work).
I expect it's some combination of both. I don't have data, though.
 
Back
Top