Excellent, you have data! I have questions...
So to start with, you could clearly build a Max by using a CPU tile and two Pro GPU tiles (as long as the GPU tiles had two sets of fusion connections). And you could make an Ultra-ish thing with one CPU tile and FOUR GPU tiles. So why not do that? It's a big winner on the metrics you've mentioned (die size & thus cost). ...I guess maybe they *are* doing that, but we won't know until we see die shots. But I'll guess they aren't because that's not a great way to get to good performance (at least so far).
Do you have any sense of the magnitude of the tradeoffs involved, going from monolithic to not? I mean, you have a formula for yield, good, and a respectable claim that packaging isn't that expensive. What about loss in the packaging process? Do you know what that's like? And, most difficult, do you have any sense of what it costs in power and performance to go from monolithic to chiplets?
I really don't have a clue and you're the first person I've encountered on line who might.
Oh, of course, there's one more thing we could maybe put a number on: The value to Apple in having chiplets they can play with in the lab in ways far removed from what they're already selling (say, 16 GPU tiles in a switched mesh... or 16 CPU tiles!). That (deriving a value, not playing in the lab!) seems impossibly difficult, though.
There are multiple factors that are at play here. As a CPU designer, pretty much my job description was to trade-off these factors to acceptable levels in order find the best solution.
So we’ve discussed cost and yield. Performance, package size, and cooling are others. We can also add 2nd order effects like RF noise, etc.
So if you split the GPU into multiple die, that can have multiple effects depending on how you do it. One thing to keep in mind is every time you have to send a signal between chips you (1) increase the time it takes for those signals and (2) increase power consumption. This isn’t even necessarily because splitting a die moves things farther apart; you are sticking I/O cells (drivers, receivers) in the path, which have non-zero propagation times. The off-die wiring is also bigger, clunkier, and thus has more parasitic capacitance, so you also add a slew time penalty as you need to charge and discharge those relatively-big capacitors. All of that also costs power.
You also are increasing bus length and increasing the effective bus capacitance because now instead of just two I/O cells per bus line, you have three, each of which has an input capacitance.
On the other hand, you may have a clever physical floor plan that means you save power by physically arranging things in a more optimal way. (I doubt it. Even if GPUs don’t need to talk to each other very much, I *think* they are heavily coupled to shared local memory structures).
Another consideration is architectural. GPUs talk to each other (perhaps through shared memory) much more than they talk to the CPU (I believe - I am a CPU guy, not a GPU guy). You want them tightly coupled to each other. The CPU and GPU don’t talk much to each other (though they share memory), so by splitting CPU from GPU you are “cutting” the design at a point where there aren’t a lot of wires there, and where the latency is easier to hide.
Further, as I said, yield is an exponential. So if you are going from an X sized monolithic die to two ½ X-sized die (one for CPU, one for GPU), that’s a big improvement in yield. If you split the ½ X-sized die into two ¼-sized die, the difference in yield is less.
Example:
Reticle sized monolithic die: yield = 67%
2 half-reticle die: yield for each = 82%
1 half-reticle CPU die, two ¼ reticle GPU die: yield for CPU = 82%, yield for GPU = 91%
These numbers show the trend, but since I don’t know the actual defect density, don’t take them as completely true. I assumed D0 = 0.4, which is a reasonable guess.
So, as you can see, it’s diminishing returns as you get smaller and smaller. And given the problems it causes in terms of the power/delay issue each time you cross die boundaries, it quickly becomes not worth the trouble.