AMD isn't using advanced packaging tech like Apple, just conventional organic substrates. The IFOP (Infinity Fabric On Package) links connecting their Core Complex Die (CCDs) to the IO die are narrow and high clock rate per wire, meaning they require SERDES.
It's a bit harder to be certain about Apple's setup as they give out even less detail than AMD, but the numbers we do have say their interface is very wide and slow. On the one hand, power per bit lane should be way lower with Apple's approach - AMD's packaging means transmission lines are longer and should require a lot more energy to yank around, and needing a SERDES for each wire costs both silicon and power. On the other hand, Apple has a lot more wires in their die-to-die interconnect. I have no great feel for who wins here.
Apple arguably had a tougher problem to solve: they had to make their solution support splitting the GPU in two. In AMD's Zen desktop products, they don't have as large a GPU, and it's always completely contained within the IO die (which is where the memory controllers live), so no need to pollute the IFOP links with GPU traffic. While Apple's GPU cores probably don't need to talk to each other much, they should need to talk to memory attached to the other die quite a bit.
Yet another difference between the two systems is that there is no local DRAM memory controller in a CCD. So, AMD's CPUs always have to go off-die for cache misses, and IFOP links are almost certainly higher latency than Apple's interconnect. The tradeoffs here are just so very different.
As an aside, in this realm, people usually measure in units of picojoules per bit transported from one chip to another. Kind of a neat unit.
My intuition, for what it's worth, is that it probably isn't Ultra Fusion. UF seems intentionally overengineered, as if Apple wanted to make certain that if there were problems scaling their GPU up to be huge, the interconnect wouldn't be one.
Informative, thanks @mr_roboto!
Yeah I’ve been hoping for more tests from reviewers but of course financially that isn’t plausible as it requires making a lot of tests on now old products which even if you do get interesting results (not guaranteed) isn’t going to draw eyeballs as much. Well maybe now that the M2s are out more than just MaxTech will revisit.I don't think there's much pure forum posting can do to provide definitive answers. Best approach I can think of is acquiring three M1 systems (pro, max, and ultra), getting real familiar with Metal, and going to town writing microbenchmarks. Wherever you find things which don't scale the same from Pro to Max to Ultra, you can investigate with Apple's performance monitor / counter features, which I've heard are pretty good. Apple provides these tools to enable developers to figure out performance bugs in their code, but they should also be able to provide some amount of insight into why some particular thing doesn't scale as well from Max to Ultra as you'd expect
Last edited: