SME: new microbenchmarks

leman

Site Champ
Joined
Oct 18, 2021
Posts
862
To distract everyone from the election, here is an updated version of my SME analysis: https://github.com/tzakharko/m4-sme-exploration

Note that my previous results contained a bit, which has massively overestimated the I8 performance. I have reported I8 MAC at 16 TOPs, in reality it is only 4 TOPs. Apologies for the confusion.

I’ve rewritten and streamlined the entire project, implemented a new benchmark template, and added memory tests. Most of the stuff is still being analyzed, so as of now only the matrix operation report is uploaded, but I’ll been adding other sections as I progress with them.

Looking forward to your feedback!
 
Last edited:
Some M4 stats:


That a bit low IMO. The P-core cluster can sustain 2000 GFLOPS FP32 outer product, and the memory subsystem is optimized for matmul. I thought that peak performance should be achievable in practice. I'd expect around 3800-4000 GFLOPS.
 
That a bit low IMO. The P-core cluster can sustain 2000 GFLOPS FP32 outer product, and the memory subsystem is optimized for matmul. I thought that peak performance should be achievable in practice. I'd expect around 3800-4000 GFLOPS.
Yeah I noticed Dougall saying the P-core cluster would only be running at 3.32GHz which is a much bigger difference between peak and SME than in previous models - or are you already taking that into account in your calculations?
 

Similar threads

Back
Top