BREAKING: High end M5 chips release imminent

The Hardcard · Feb 8, 2026

I am referring to Strix Halo and GB10/N1X.

EDIT: - accidentally hit the reply button too soon.

These are both large CPU/GPU combos with varying degrees of unified memory, essentially akin to Apple’s Max, except both only have 256-bit buses so about half the bandwidth depending on the memory speed.

This puts them at a significant disadvantage for LLMs, before we even get to linking Maxes to make an Ultra. Even wilder, GB10 and Halo were specced with generative AI in mind, whereas I don’t believe it was in Apple’s sights for the Max.

Yoused · Feb 8, 2026

mr_roboto said:
To hit graphics performance targets similar to 2020 dGPUs, M1 Max needed bandwidth in the same class. However, since M1 Max also has a bunch of high performance, latency-sensitive CPU cores, it couldn't use GDDR. That seems to be what pushed Apple towards wide LPDDR memory configurations.

The key point here is LPDDR: Apple faces retail market dynamics, in which portable devices (mostly MBs but also iPads) want to maximize battery life. One of Nvidia's targets is gamers, who want max GPU power at whatever cost (mostly, I think, for bragging rights). Nvidia's market reach is broader than Apple's, so they can have a diverse product lineup, while Apple does better with a narrower product spread. iGPU + wide LPDDR seems to be a very good compromise. Apple will probably never catch Nvidia on the top end, but they at least match them in the middle cange, which is good enough for most of Apple's market.

leman · Feb 8, 2026

The Hardcard said:
Nvidia and AMD both passed on 512-bit buses, 4 years after Apple went for it, interesting that Apple chose to be that bold and aggressive in whenever the Max design was laid out, 2017 or 2018.

NVIDIA and AMD both use much faster RAM, so they don’t need wide busses to hit their performance targets on the consumer GPU. Apple uses relatively slow (by GPU standards) RAM, so they need a wider bus. Max’s bandwidth is nothing special by GPU standards. The 512-bit bus is a necessity, not an advantage. It’s essentially a “poor man’s” HBM, just slower and cheaper.

BTW, NVIDIA is supposed to release their “big” ML workstation soon, featuring larger GPUs and HBM. Curious about the pricing.

leman · Feb 9, 2026

mr_roboto said:
dGPU engineers mostly use GDDRn memory. GDDRn offers much better bandwidth per data bus pin than DDRn/LPDDRn, but as I understand it, worse latency. GPUs tend to be less sensitive to latency, so that's fine, and the greater bandwidth per pin lets dGPUs get away with a narrower memory interface than they'd need with other memory interface standards.

To hit graphics performance targets similar to 2020 dGPUs, M1 Max needed bandwidth in the same class. However, since M1 Max also has a bunch of high performance, latency-sensitive CPU cores, it couldn't use GDDR. That seems to be what pushed Apple towards wide LPDDR memory configurations.

I am actually curious about the latency of GDDR. From what I understand the latency is worse because of the larger burst size (?), but finding numbers has been challenging. I saw some mentions of ~ 30ns for GDDR5 (compared to ~ 15ns for DDR5). Then again, the RAM access latency on Apple Silicon is in the ballpark of 100-150ns. Apple's memory hierarchy has never been know for its low latency, and the inter-cluster communication is quite slow as well. Apple however is very good at hiding the latency with their large caches and deep out-of-order execution.

I'd guess that the main reasons for LDPPR are power consumption and reliability. GDDR runs hot, and that's just not what you want for your system memory in a portable or ultracompact system. Also, unless I am very mistaken, LPDDR offers considerably higher density than GDDR.

dada_dave · Feb 9, 2026

leman said:
I am actually curious about the latency of GDDR. From what I understand the latency is worse because of the larger burst size (?), but finding numbers has been challenging. I saw some mentions of ~ 30ns for GDDR5 (compared to ~ 15ns for DDR5). Then again, the RAM access latency on Apple Silicon is in the ballpark of 100-150ns.

I’m a little confused, are you comparing the time to read memory into the bus from the DDR chip itself with the time to read memory into the core? Because DDR based CPUs take 60-100ns to read data as well:

Examining Intel's Arrow Lake, at the System Level

Arrow Lake is the codename for Intel's newest generation of high performance desktop CPUs.

chipsandcheese.com

Apple’s chips are on the high end for latency (compared to desktop based DDR systems anyway) but not THAT extreme.

leman said:
Apple's memory hierarchy has never been know for its low latency, and the inter-cluster communication is quite slow as well. Apple however is very good at hiding the latency with their large caches and deep out-of-order execution.

I'd guess that the main reasons for LDPPR are power consumption and reliability. GDDR runs hot, and that's just not what you want for your system memory in a portable or ultracompact system. Also, unless I am very mistaken, LPDDR offers considerably higher density than GDDR.

leman said:
NVIDIA and AMD both use much faster RAM, so they don’t need wide busses to hit their performance targets on the consumer GPU. Apple uses relatively slow (by GPU standards) RAM, so they need a wider bus. Max’s bandwidth is nothing special by GPU standards. The 512-bit bus is a necessity, not an advantage. It’s essentially a “poor man’s” HBM, just slower and cheaper.

BTW, NVIDIA is supposed to release their “big” ML workstation soon, featuring larger GPUs and HBM. Curious about the pricing.

He clarified that he’s comparing to GB10 and Strix Halo rather than GDDR based dGPUs. For the latter, except for the professional versions outside the budget of most people those often lack the RAM capacity for ML work anyway.

leman · Feb 10, 2026

dada_dave said:
I’m a little confused, are you comparing the time to read memory into the bus from the DDR chip itself with the time to read memory into the core? Because DDR based CPUs take 60-100ns to read data as well:

Examining Intel's Arrow Lake, at the System Level

Arrow Lake is the codename for Intel's newest generation of high performance desktop CPUs.

chipsandcheese.com

Apple’s chips are on the high end for latency (compared to desktop based DDR systems anyway) but not THAT extreme.

He clarified that he’s comparing to GB10 and Strix Halo rather than GDDR based dGPUs. For the latter, except for the professional versions outside the budget of most people those often lack the RAM capacity for ML work anyway.

I'll be damned, you are absolutely right. I must have had my memory lanes crossed. I remember that the latency on desktop CPUs was supposed to be lower, but digging though the charts now it seems that it was never lower than ~ 70-80 ns. And recent architectures with DDR5 and larger caches seem to increase the latencies to 100+ ns. So actually Apple's hierarchy is not as slow as I thought. Again, sorry, for the confusion and thank you for catching this.

P.S. I wanted to amend my original post to point out the error, but I can't do it anymore

Let this stand on the record!

cbum · Feb 10, 2026

leman said:
I'll be damned

Actually, thanks to that post, you are very unlikely to be damned!

BREAKING: High end M5 chips release imminent

The Hardcard

Member

Yoused

up

leman

Elite Member

leman

Elite Member

dada_dave

Elite Member

Examining Intel's Arrow Lake, at the System Level

leman

Elite Member

Examining Intel's Arrow Lake, at the System Level

cbum

Elite Member