The Flame
Power User
- Joined
- Oct 28, 2024
- Posts
- 60
Why have Tensor units within the GPU?
A thread by Sebastian Aaltonen, a graphics industry veteran.
A thread by Sebastian Aaltonen, a graphics industry veteran.
x.com
x.com
This is the reason why you want tensor units inside GPU compute units instead of NPU for graphics processing.
Upscaling runs in middle of the frame, so you'd have GPU->NPU->GPU dependency, which would create a bubble. Also NPU would be unused for most of the frame duration.
NPU would need to:
1. Support low latency fence sync with GPU
2. Have shared memory and shared LLC with GPU
3. Be able to read+write GPU swizzled texture layouts
4. Be able to read+write DCC compressed GPU textures
And you still have the above mentioned problem.
If your GPU is waiting for the NPU in middle of the frame. You might as well put the tensor units in the GPU to get 1, 2, 3 and 4 all for free. That's why you put tensor units inside GPU compute units. It's simpler and it reuses all the GPU units and register files.
Sony or Nvidia (GPU tensor) vs Microsoft/Qualcomm AutoSR (NPU) comparison:
1. NPU runs after GPU has finished the frame. Not in the middle of the frame. Upscales UI too. Which is not nice
2. Adds one frame of latency. NPU processes frame in parallel while GPU runs frame+1
TV sets also use NPUs for upscaling, as added latency is not an issue. GPU tensor cores are better when latency matters (gaming). Also, games want to compose low-res 3d content + native UI. NPU is not good for this kind of workload.