Apple M5 rumors

I don't think that's correct for GPU architectures such as Nvidia's Lovelace and Intel’s Xe2, which have dedicated pipes for matrix operations. So while the traditional vector pipes do rasterization, the matrix pipes can simultaneously do upscaling.

I don’t know about Intel, but as far as I am aware no current Nvidia architecture can do matrix operation concurrently with other operations. Tensor cores work together with the rest of the system to do matrix multiplication, they are not a standalone unit. Nvidia scheduler can only dispatch a single instruction per cycle anyway.
 
I think the main question is whether your hardware can interleave GPU and NPU execution well enough. Sebastian mentions that using the NPU for upscaling would create a bubble. This is only the case if the upscaling step is slower than the frame generation.
Sebastian said something along the lines of upscaling taking place before the end of the graphics pipeline. If so, I guess that means some (many?) game engines do rasterize > upscale > other postprocessing.

What's unclear to me is why you can't get some parallelsim anyways. In TBDR GPUs, rasterization completes in tile-sized chunks. As each tile finishes, its pixel data can be tossed over to the upsizer, while GPU tile engines are working on other tiles in the same frame. Seems like there should be scope for GPU/upsizer parallelism, even if the upsizer is running in the Neural Engine.

(It is very possible I have missed something. I haven't done any GPU programming in about 20 years.)
 
Sebastian said something along the lines of upscaling taking place before the end of the graphics pipeline. If so, I guess that means some (many?) game engines do rasterize > upscale > other postprocessing.

What's unclear to me is why you can't get some parallelsim anyways. In TBDR GPUs, rasterization completes in tile-sized chunks. As each tile finishes, its pixel data can be tossed over to the upsizer, while GPU tile engines are working on other tiles in the same frame. Seems like there should be scope for GPU/upsizer parallelism, even if the upsizer is running in the Neural Engine.

(It is very possible I have missed something. I haven't done any GPU programming in about 20 years.)

I am not sure that parallelism at tile level would work well. Tiles are very small and probably don’t have enough data to be princesses on an NPU effectively (a d if they do, the synchronization overhead would likely be massive). In addition, you need to sample across the tile boundaries to do upscaling (pixels around the tile edges). Finally, tiles can be flushed prematurely (buffer overflows, transparency), so you have to wait until the end of the rendering pipeline anyway.

However, you don’t need to do anything too complex to get concurrency. The GPU can start working on the next frame while the NPU is doing the upscaling. As long as the upscaling (including synchronization) runs as fast or faster than the rendering phase, you should get good GPU utilization. It just boils down to whether your NPU is fast enough.
 

Makes a lot of sense, if true. At the very least it gives them flexibility to mix and match cpu and gpu count for different market segments.
 
Back
Top