All M5 P/M/E uARCH diagrams and SPEC results

exoticspice1

Site Champ
Joined
Jul 19, 2022
Posts
449
M5 P core:

1773538301377.png


M5 M core:
1773538369826.png


M5 E core:
1773538351888.png



SPEC CPU 2017 details on M cores:
1773538399451.png




sourced from Baidu.
 

Attachments

  • 1773538341705.png
    1773538341705.png
    371.1 KB · Views: 3
See, THIS is the kind of thing that means something to me…. I used to have notebooks full of scribbles like this, and every whiteboard in our cubicle farm had this stuff all over it.

Also, of course they call them P, M, E and none of that silly “S” stuff. Engineers do their thing, and half the time they see a marketing name for something and have no idea which thing that refers to.
 
So, to me, the most interest thing is the allocation of different features to different ALUs/FPUs. And, of course, the number of each (and LSUs). Everything else is more or less consistent across most RISC processors - very standard block diagram (of course the size of buffers and caches and such varies from core to core).
 
Load/store is interesting. On the M, the address generation units can each only feed either the load queue or the store queue, but not both. Same on the P, looks like. But the E has fewer AGUs and allows them to drive either queue.
 
Load/store is interesting. On the M, the address generation units can each only feed either the load queue or the store queue, but not both. Same on the P, looks like. But the E has fewer AGUs and allows them to drive either queue.
I'm looking at Apple's "Apple Silicon CPU Optimization Guide" as a reference. It hasn't been updated for M5 yet, but it seems that load/store resources didn't change at all from M1 through M4, so I'm guessing it will still be accurate for M5 P and E cores. Apple provides these µop throughput limits (all numbers per cycle):

P cores burst: 3 LD + 2 ST address + 2 ST data
P cores sustained: 4 µops, 2 writes into cache

E cores burst: (2 LD, or 1 LD + 1 ST address, or 2 ST address) + 2 ST data
E cores sustained: 2 µops, 1 write into cache

edit: I should note that the doc states that only stores are cracked into two µops, one address and one data. Loads are a single µop that both generates the address and loads the data.
 
I only looked at the P. The M has two ALUs with multipliers, and the E has only 1. But no divider on the M (and 1 on the E)? That makes no sense either.

because probably it's a fan analysis and not an official document from Apple
 
Last edited:
No clue if this fan analysis is accurate to the actual architecture, but I will say one thing: the P core in the M5 line up is now able to be taken advantage of just like an S core.

I saw a test in Logic Pro that compared M2 Pro vs M3 Pro, and M4 Pro vs M5 Pro. What's interesting is that Logic Pro usually can only utilize high performance cores. With M5, the P core is now also recognized as a high performance core despite it being the high efficiency in their HP:HE ratio.

The effects are this:
M3 Pro has 5:6 (HP:HE) vs M2 Pro 6:4. M3 Pro can do 10 less tracks in Logic Pro and has 1 less HP core.

M5 Pro has 6:12 vs M4 Pro 8:4. M5 Pro can do 70 more tracks despite have 2 less HP cores.

When I was looking at other tests, it seems this ability for their HE core to be used in the same way as a HP core is completely unique to M5's new P core. So it would be interesting to see what exactly enables that new functionality, because it has dramatic implications for future apps! It's so cool :D

Thanks @exoticspice1 I hope you're enjoying your new notebook :)
 
The M-core diagram doesn’t look right to me, the others seem to be copies of the M4 info from Apples CPU manual. The 1MB private L2 is certainly interesting.

Looking at the SPEC results, it appears that E- and M-cores have identical IPC in INT workloads, so I’d expect the port layout to be very similar if not identical. At the same time, M-cores get a FP boost. I suspect that the M-cores might have an fourth FP unit. Which would make sense for an architecture optimized for professional many-core workloads. Or it could be the effect of increased load/store?

My takeaway is that M-cores are indeed an evolution of E-cores tuned for higher clock and larger execution windows. Nothing really unexpected here.
 
Last edited:
Back
Top