BREAKING: High end M5 chips release imminent

I am referring to Strix Halo and GB10/N1X.

EDIT: - accidentally hit the reply button too soon.

These are both large CPU/GPU combos with varying degrees of unified memory, essentially akin to Apple’s Max, except both only have 256-bit buses so about half the bandwidth depending on the memory speed.

This puts them at a significant disadvantage for LLMs, before we even get to linking Maxes to make an Ultra. Even wilder, GB10 and Halo were specced with generative AI in mind, whereas I don’t believe it was in Apple’s sights for the Max.
 
Last edited:
To hit graphics performance targets similar to 2020 dGPUs, M1 Max needed bandwidth in the same class. However, since M1 Max also has a bunch of high performance, latency-sensitive CPU cores, it couldn't use GDDR. That seems to be what pushed Apple towards wide LPDDR memory configurations.
The key point here is LPDDR: Apple faces retail market dynamics, in which portable devices (mostly MBs but also iPads) want to maximize battery life. One of Nvidia's targets is gamers, who want max GPU power at whatever cost (mostly, I think, for bragging rights). Nvidia's market reach is broader than Apple's, so they can have a diverse product lineup, while Apple does better with a narrower product spread. iGPU + wide LPDDR seems to be a very good compromise. Apple will probably never catch Nvidia on the top end, but they at least match them in the middle cange, which is good enough for most of Apple's market.
 
Nvidia and AMD both passed on 512-bit buses, 4 years after Apple went for it, interesting that Apple chose to be that bold and aggressive in whenever the Max design was laid out, 2017 or 2018.

NVIDIA and AMD both use much faster RAM, so they don’t need wide busses to hit their performance targets on the consumer GPU. Apple uses relatively slow (by GPU standards) RAM, so they need a wider bus. Max’s bandwidth is nothing special by GPU standards. The 512-bit bus is a necessity, not an advantage. It’s essentially a “poor man’s” HBM, just slower and cheaper.

BTW, NVIDIA is supposed to release their “big” ML workstation soon, featuring larger GPUs and HBM. Curious about the pricing.
 
dGPU engineers mostly use GDDRn memory. GDDRn offers much better bandwidth per data bus pin than DDRn/LPDDRn, but as I understand it, worse latency. GPUs tend to be less sensitive to latency, so that's fine, and the greater bandwidth per pin lets dGPUs get away with a narrower memory interface than they'd need with other memory interface standards.

To hit graphics performance targets similar to 2020 dGPUs, M1 Max needed bandwidth in the same class. However, since M1 Max also has a bunch of high performance, latency-sensitive CPU cores, it couldn't use GDDR. That seems to be what pushed Apple towards wide LPDDR memory configurations.

I am actually curious about the latency of GDDR. From what I understand the latency is worse because of the larger burst size (?), but finding numbers has been challenging. I saw some mentions of ~ 30ns for GDDR5 (compared to ~ 15ns for DDR5). Then again, the RAM access latency on Apple Silicon is in the ballpark of 100-150ns. Apple's memory hierarchy has never been know for its low latency, and the inter-cluster communication is quite slow as well. Apple however is very good at hiding the latency with their large caches and deep out-of-order execution.

I'd guess that the main reasons for LDPPR are power consumption and reliability. GDDR runs hot, and that's just not what you want for your system memory in a portable or ultracompact system. Also, unless I am very mistaken, LPDDR offers considerably higher density than GDDR.
 
I am actually curious about the latency of GDDR. From what I understand the latency is worse because of the larger burst size (?), but finding numbers has been challenging. I saw some mentions of ~ 30ns for GDDR5 (compared to ~ 15ns for DDR5). Then again, the RAM access latency on Apple Silicon is in the ballpark of 100-150ns.
I’m a little confused, are you comparing the time to read memory into the bus from the DDR chip itself with the time to read memory into the core? Because DDR based CPUs take 60-100ns to read data as well:


Apple’s chips are on the high end for latency (compared to desktop based DDR systems anyway) but not THAT extreme. :)

Apple's memory hierarchy has never been know for its low latency, and the inter-cluster communication is quite slow as well. Apple however is very good at hiding the latency with their large caches and deep out-of-order execution.

I'd guess that the main reasons for LDPPR are power consumption and reliability. GDDR runs hot, and that's just not what you want for your system memory in a portable or ultracompact system. Also, unless I am very mistaken, LPDDR offers considerably higher density than GDDR.

NVIDIA and AMD both use much faster RAM, so they don’t need wide busses to hit their performance targets on the consumer GPU. Apple uses relatively slow (by GPU standards) RAM, so they need a wider bus. Max’s bandwidth is nothing special by GPU standards. The 512-bit bus is a necessity, not an advantage. It’s essentially a “poor man’s” HBM, just slower and cheaper.

BTW, NVIDIA is supposed to release their “big” ML workstation soon, featuring larger GPUs and HBM. Curious about the pricing.
He clarified that he’s comparing to GB10 and Strix Halo rather than GDDR based dGPUs. For the latter, except for the professional versions outside the budget of most people those often lack the RAM capacity for ML work anyway.
 
Last edited:
Back
Top