Perhaps one advantage of an isolated dGPU is freedom of sloppiness. The CPU sends the code (pretty compact), description files (still pretty small) and textures (generally less small) and the GPU chomps on it to produce the result. If the output is to the screen, the GPU just handles it and the processor goes on doing what it was doing, unaffected (assuming the card includes the display driver).
Ultimately, the data transfer is usually a lot smaller than the work frame, and if the GPU has enough room, it can generate intermediate content with abandon (such as hidden surfaces that do not sow up in the final output) – sloppiness. The final result (image) will still be smaller to pass back, if necessary, than the GPU's overall workset. I believe the same is somewhat true for heavy math jobs, though somewhat less so.
So a UMA iGPU has to be a little more elegant in how it runs, making more judicious use of limited memory space and bus bandwidth. The code has to be more efficient, which means that an iGPU will almost always draw less power for the same job, because it will do it differently. It would seem that the dGPU is the sledgehammer approach.