M4 Mac Announcements

One thing I am curious about: there are charts on which you can compare M-series Macs against nVidia and AMD graphics cards; in OpenCL, the Mac GPU worse than half the score of the dGPUs, but in Metal, the separation is quite a bit closer, the highest Mac being behind the highest card by around 5%. I realize that OpenCL has some serious deficiencies and should not be relied on as a good measure. What I am curious about is whether there are performance/efficiency comparisons between Metal and the other graphics APIs. How does Metal compare to Vulkan, DirectX and OpenCL for the same jobs?
Comparing across APIs when the hardware is different is extremely difficult. For OpenCL/GL on macOS, we do occasionally have the same (AMD) hardware running the other APIs but then the OpenGL/CL implementation on macOS is practically deprecated and/or running through a Metal translation layer anyway. In general though, I’ve tried looking at this using benchmarks that use different APIs for the same task (Geekbench, some 3D Mark ones, Aztec Ruins, etc …) and while I haven’t charted them all out rigorously, I’ve never noted a consistent pattern. From what I gather from people who work in the field, is that none of the modern APIs (DirectX, Vulkan, Metal) are innately, substantially superior in regards to performance, but drivers for the particular hardware matter a lot and basically swamp most things, even competing with or surpassing hardware differences.
 
Some scores for a variety of LLMs being run on M3 Ultra, M3 Max and a 5090
From the review here: https://creativestrategies.com/mac-studio-m3-ultra-ai-workstation-review/

View attachment 34155
I re-read the article (thanks again for linking it) and have some additional thoughts:

(1) In describing the table comparing a 5090 PC to AS Macs, he says "Below is just a quick ballpark of the same prompt, same seed, same model on 3 machines from above. This is all at 128K token context window (or largest supported by the model) and using llama.cpp on the gaming PC and MLX on the Macs....The theoretical performance of an optimized RTX 5090 using the proper Nvidia optimization is far greater than what you see above on Windows, but this again comes down to memory. RTX 5090 has 32GB, M3 Ultra has a minimum of 96GB and a maximum of 512GB. [emphasis his]"

The problem is that, when he presents that table, he doesn't explictly provide the size of the model he's using, so I don't know the extent to which it exceeds the 32 GB RAM on the 5090. I don't understand why tech people omit such obvious stuff in their writing. Well, actually, I do; they're not trained as educators, and thus not trained to ask "If I someone else were reading this, what key info. would they want to know?" OK, rant over. Anyways, can you extract this info from the article?

(2) This is interesting:
"You can actually connect multiple Mac Studios using Thunderbolt 5 (and Apple has dedicated bandwidth for each port as well, so no bottlenecks) for distributed compute using 1TB+ of memory, but we’ll save that for another day."
I've read you can also do this with the Project DIGITS boxes. It would be interesting to see a shootout between an M3 Ultra with 256 GB RAM ($5,600 with 60-core GPU or $7,100 with 80-core GPU) and 2 x DIGITS ($6,000, 256 GB combined VRAM). Or, if you can do 4 x DIGITS, then that ($12,000, 512 GB VRAM) vs. a 512 GB Ultra ($9,500 with 80-core GPU).

(3) And this is surprising:
"...almost every AI developer I know uses a Mac! Essentially, and I am generalizing: Every major lab, every major developer, everyone uses a Mac."
How can that be, given that AI-focused data centers are commonly NVIDIA/CUDA-based. To develop for those, you would (I assume) want to be working on an NVIDIA workstation. Is the fraction of AI developers writing code for data center use really that tiny?
 
Last edited:
Back
Top