Some scores for a variety of LLMs being run on M3 Ultra, M3 Max and a 5090
From the review here:
https://creativestrategies.com/mac-studio-m3-ultra-ai-workstation-review/
View attachment 34155
I re-read the article (thanks again for linking it) and have some additional thoughts:
(1) In describing the table comparing a 5090 PC to AS Macs, he says "Below is just a quick ballpark of the same prompt, same seed, same model on 3 machines from above. This is all at 128K token context window (or largest supported by the model) and using llama.cpp on the gaming PC and MLX on the Macs....
The theoretical performance of an optimized RTX 5090 using the proper Nvidia optimization is far greater than what you see above on Windows, but this again comes down to memory. RTX 5090 has 32GB, M3 Ultra has a minimum of 96GB and a maximum of 512GB. [emphasis his]"
The problem is that, when he presents that table, he doesn't explictly provide the size of the model he's using, so I don't know the extent to which it exceeds the 32 GB RAM on the 5090. I don't understand why tech people omit such obvious stuff in their writing. Well, actually, I do; they're not trained as educators, and thus not trained to ask "If I someone else were reading this, what key info. would they want to know?" OK, rant over. Anyways, can you extract this info from the article?
(2) This is interesting:
"You can actually connect multiple Mac Studios using Thunderbolt 5 (and Apple has dedicated bandwidth for each port as well, so no bottlenecks) for distributed compute using 1TB+ of memory, but we’ll save that for another day."
I've read you can also do this with the Project DIGITS boxes. It would be interesting to see a shootout between an M3 Ultra with 256 GB RAM ($5,600 with 60-core GPU or $7,100 with 80-core GPU) and 2 x DIGITS ($6,000, 256 GB combined VRAM). Or, if you can do 4 x DIGITS, then that ($12,000, 512 GB VRAM) vs. a 512 GB Ultra ($9,500 with 80-core GPU).
(3) And this is surprising:
"...almost every AI developer I know uses a Mac! Essentially, and I am generalizing: Every major lab, every major developer, everyone uses a Mac."
How can that be, given that AI-focused data centers are commonly NVIDIA/CUDA-based. To develop for those, you would (I assume) want to be working on an NVIDIA workstation. Is the fraction of AI developers writing code for data center use really that tiny?