And it appears the type of computations needed can also affect core utilization. For instance, according to this article, half the ALU's (the article calls them shader cores, and NVIDIA calls CUDA cores) in Ampere (3000-series) are FP-only, and half can do INT or FP. If so, and if your task is INT-heavy, it seems some cores might remain idle. Not sure how Apple's M-series, or NVIDIA's Ada Lovelace (4000-series)*, work in this regard.There is also the issue of that being an abstract number that has a lot of other confounding variables. I suspect that it is straight-up impossible to come close to max theoretical throughput just on the basis of whether you can actually feed the units at a high enough rate. Maybe a card, with its separate memory block, could get closer than a UMA-based GPU, but what effect does the transfer of a big wad of data have on net performance?
I mean, granted a discrete GPU doing gamez will typically not have to shift as much data, as it would be driving the display itself, but if you are doing the heavy math stuff or rendering, the big wad of data does eventually have to end up back in main memory. People interested in non-gaming production will be affected by the transfers.
And for the curious, who have not seen it, here is Asahi's reverse-engineering peek at Apple's GPU achitecture.
NVIDIA's RTX 3000 cards make counting teraflops pointless
With NVIDIA's first RTX 3000 cards arriving in weeks, you can expect reviews to give you a firm idea of Ampere performance soon. Even now, it feels safe to say that Ampere represents a monumental leap forward for PC gaming. However these cards stack up, though, it’s clear that their worth can no...
www.engadget.com
*Just found this about Ada Lovelace: https://wccftech.com/nvidia-ada-lov...-than-ampere-4th-gen-tensor-3rd-gen-rt-cores/
"each sub-core will consist of 128 FP32 plus 64 INT32 units for a total of 192 units."
I don't know how to interpret this. Does it mean that, with Lovelace, you no longer have ALU's that can do both FP and INT, and that they've instead separated out the capability? NVIDIA says the Lovelance Tensor cores have separate INT and FP paths, but I believe there are many fewer of those than the shader cores.
Last edited: