Rumored M4 MBP geekbench

I gather M4 represents a larger microarchitecture change than some thought - the cores are now 10 wide instead of 8 wide. Interesting.

I think we’ve now gotten just about as wide as we’re going to get. My expertise isn’t Arm - I designed schedulers for Sparc and x86-64 - but it’s hard for me to imagine going wider wouldn’t lead to rapidly diminishing returns. Instructions tend to depend on the results of other instructions, and as you get wider and wider, you’ll find more and more of the time you won’t be able to find N instructions without dependencies. And if you’re relying on speculative execution (branch predictions, etc.) the penalty becomes progressively worse for mistaken guesses. Unless, of course, you go to something like multithreading, where you can issue completely unrelated instructions from different threads.

If they add more registers, perhaps, they can maybe reduce the likelihood of dependencies. Would be interesting to model that with real instruction streams.
 
I think we’ve now gotten just about as wide as we’re going to get. My expertise isn’t Arm - I designed schedulers for Sparc and x86-64 - but it’s hard for me to imagine going wider wouldn’t lead to rapidly diminishing returns. Instructions tend to depend on the results of other instructions, and as you get wider and wider, you’ll find more and more of the time you won’t be able to find N instructions without dependencies. And if you’re relying on speculative execution (branch predictions, etc.) the penalty becomes progressively worse for mistaken guesses. Unless, of course, you go to something like multithreading, where you can issue completely unrelated instructions from different threads.

If they add more registers, perhaps, they can maybe reduce the likelihood of dependencies. Would be interesting to model that with real instruction streams.
Just to be clear I assume you mean architectural registers not entries in the register file?

Personally I hope they’ll do something akin to amd’s 3D-vcache at some point. I can totally see that as a way to improve performance without needing to go wider or increase clocks too much. Assuming that there are meaningful memory stalls as is of course. Used properly more cache like that might also facilitate more aggressive prediction behavior as it can reduce the cost of a mispredict if neither case need to touch ram but both can fit in cache
 
Just to be clear I assume you mean architectural registers not entries in the register file?

Personally I hope they’ll do something akin to amd’s 3D-vcache at some point. I can totally see that as a way to improve performance without needing to go wider or increase clocks too much. Assuming that there are meaningful memory stalls as is of course. Used properly more cache like that might also facilitate more aggressive prediction behavior as it can reduce the cost of a mispredict if neither case need to touch ram but both can fit in cache
yeah, i meant architectural registers.
 
Back
Top