Well, it's obvious why the SLC is near the memory controllers. Placement of the GPU may be coincidental, or it may be that GPUs being the biggest consumers of memory bandwidth, at least in some scenarios, makes that actually worthwhile.
I suspect there is a modest but measurable advantage to having the RAM mounted PoP, as opposed to using CAMMs - the shorter distance probably allows for a lower pJ/b. I don't know that for sure though. (
@Cmaier?) Whether that's enough to matter is another thing entirely.
I really think this entire story is garbled nonsense. We already have Mx chips with 512-bit-wide memory, successfully delivering devastating bandwidth numbers. Apple knows how to do this, and they are doing it already. And guess what? They're not using vias, it's all shoreline. So chip area is not a major factor, though chip perimeter might be. They could at least in part solve that by making chips much less square, giving each one more perimeter for the same area.
Now, is this substantially different from what they're doing with Ax chips? It does seem so, and maybe with them area really is a limiting factor. But it's not like they don't already know how to do it another way. So if bandwidth really does motivate them, then... it still plays out like I said in my last post.