- Joined
- Sep 26, 2021
- Posts
- 6,333
- Main Camera
- Sony
I gather M4 represents a larger microarchitecture change than some thought - the cores are now 10 wide instead of 8 wide. Interesting.
I think we’ve now gotten just about as wide as we’re going to get. My expertise isn’t Arm - I designed schedulers for Sparc and x86-64 - but it’s hard for me to imagine going wider wouldn’t lead to rapidly diminishing returns. Instructions tend to depend on the results of other instructions, and as you get wider and wider, you’ll find more and more of the time you won’t be able to find N instructions without dependencies. And if you’re relying on speculative execution (branch predictions, etc.) the penalty becomes progressively worse for mistaken guesses. Unless, of course, you go to something like multithreading, where you can issue completely unrelated instructions from different threads.
If they add more registers, perhaps, they can maybe reduce the likelihood of dependencies. Would be interesting to model that with real instruction streams.