According to Apple, 512 GB is big enough for their representative 600B parameter model. And I assume this means just big enough, i.e., if 700B would fit, they would have said that instead.
Currently quantizing parameters at 6 bits results in a model that is very close to the original model. 512GB could quite easily fit 512 billion * 8/6 = 682 billion parameters, with the upper limit being somewhere between that and a cool trillion parameters.
However that doesn’t account for the memory required to run the model and preserve the context of whatever inputs it has processed. So once you start bumping that 600B number up, it starts to sound like tech bro bullshit.
Plus, currently, there aren’t any dense models that break 600B outside of data centers, and sparse (e.g. mixture of experts) models have or will soon have inference tricks to load the most likely next parts into RAM while it’s cranking out the current tokens (e.g. certain kinds of speculative decoding).
Anyway, I’m also disappointed that the new Ultra is the M3 generation, but 512GB RAM is *truly* awesome for those in ML. And, for those who aren’t, the M3 generation did add ray tracing among other stuff, so it’s probably a pretty compelling upgrade for graphics professionals, not to mention that ML has set roots down in such fields as well.