I did a simple test to check the presence of Dynamic Caching on my A17 Pro. On a sample computer shader that requires a lot of registers on a conditional path that's never taken my M1 Max takes a 60% hit in threads per core and 25% hit in performance. A17 Pro — no difference. This is most impressive.
In fact, it is easy to underestimate what Apple did here. The practical impact on everyday's user will be small, but as GPU algorithms become more complex, this can unlock significant performance advantages. Oven more, I can imagine that it can unlock new classes of algorithms, with real dynamic memory allocation on the GPU (curious whether Apple has something planned for the next Metal update). And it's very hard to pull off engineering-wise, as registers are allocated lazily. This gives Apple GPUs a level of sophistication beyond anything else on the market. Very very impressive.
In fact, it is easy to underestimate what Apple did here. The practical impact on everyday's user will be small, but as GPU algorithms become more complex, this can unlock significant performance advantages. Oven more, I can imagine that it can unlock new classes of algorithms, with real dynamic memory allocation on the GPU (curious whether Apple has something planned for the next Metal update). And it's very hard to pull off engineering-wise, as registers are allocated lazily. This gives Apple GPUs a level of sophistication beyond anything else on the market. Very very impressive.