I am not sure your math adds up
Yes, UltraFusion is a cost that has to be paid for every single Max die produced, but it's not like a dedicated GPU die would be any different. You still need to provision some sort of high-bandwidth interface on a smaller die to connect to the GPU die. And the GPU die itself costs extra tape-outs and production resources, which are already precious. The beauty of the Max approach is its reusability, one can adapt to the market needs and direct the production resources to where the demand is.
I also don't think that a GPU die is a necessity to enable truly exceptional products. Die area and manufacturing capability are probably the biggest problems, and poor scaling of SRAM with node size improvements means that compute becomes more and more expensive. But if one splits the SoC functionality across multiply stacked dies, manufactured using properly optimised process, one can maximise process utilisation. Right now, probably less than 60% of the M2 Max die is used for compute. Imagine if one could increase it to 90%, moving the supporting functionality onto a separate die (still on the same SoC). This could mean dramatic improvement in performance with only negligible increase in cost. This is what all this stuff is about:
View attachment 24375
As to the rest, I do hope we will see compute modules one day, even if that's just for the Mac Pro...