At the risk of stealing
@theorist9’s joke, I think my previous wall of text might’ve been a tad incoherent. Hopefully this wall of text will be better!
The main purpose of managed memory is to rely on the much larger pool of CPU memory and then stream that data in and out of the GPU as needed with minimal coding needed from the programmer. While I mentioned the other way around was technically possible, I admitted that I couldn’t think of a practical use case. What I was pushing back on was the idea that the memory was necessarily mirrored on both sides where if you have 24GB of data on the CPU you have to have 24GB of space reserved on the GPU. Again that’s technically possible but that’s not necessarily the case nor often what’s wanted - especially if the GPU only has say 8GB of memory (and therefore in that case not actually possible). Let’s say if 1/3 of the data set is resident on the GPU being worked on and 2/3s is on the CPU waiting to be worked on, it can effectively take up 8/16GB of space up on each. The total amount of memory being used up can still be just 24GB spread across 16GB of CPU RAM and 8GB of VRAM.
I *could* achieve the same thing (most of the time) manually managing it, explicitly setting up buffers of 16GB on the CPU, 8GB on the GPU and manually moving the data back and forth. Most of the time this will give faster performance. But letting the driver handle it is a lot easier (less code) and comes with the advantage that if I don’t know what data I’ll need when, then I can rely on the page faults to migrate the right data over when I need it. The GPU driver and GPU keeps track of what data is where, what needs to be migrated, and what space on the CPU/GPU can be allocated/freed if needed because the other device has x% of the data currently - ie if total CPU RAM available was 32GB in our example instead of 16, I’d still have 16GB of CPU free to allocate for other tasks not 8 even when using managed memory.