Apple Silicon 16kb page size benefits

leman

Site Champ
Posts
285
Reaction score
484
Heh. I was writing a followup about this and you beat me to it. ;)

Please feel free to do it anyway! To be quite honest, I only started learning about caches recently and I still struggle with some basic concepts like tags and indices. Ive read though a bunch of introductory materials but the terminology subtly differs from what I am used to as a software guy, so I find it confusing. This might be a good addition to @Cmaier 's cache primer...
 

mr_roboto

Power User
Posts
192
Reaction score
270
Please feel free to do it anyway! To be quite honest, I only started learning about caches recently and I still struggle with some basic concepts like tags and indices. Ive read though a bunch of introductory materials but the terminology subtly differs from what I am used to as a software guy, so I find it confusing. This might be a good addition to @Cmaier 's cache primer...
I thought you covered it quite well, so there's no need. The terminology does get a bit confusing, and with how long it's been since my university classes I have to look it up when writing about it...
 

Yoused

up
Posts
4,458
Reaction score
6,664
Location
knee deep in the road apples of the 4 horsemen
If a cache line is 4 words (32 bytes, 2 read cycles), that leaves 9 bits to quick-test (512 lines) times 8-way would be 128K. The L1 can test 8 lines for possible hits before translation has finished and determine that none of the entries match, or that one or two might hit. If it finds one possible hit, it can spec-load that entry and recover later if the upper part of the tag comes in as a mismatch.

Of course, if it is a vector that happens to span two lines, across two pages, the situation could become more fraught. Presumably, a really good compiler would strive to avoid fraught data composition whenever possible, but ARMv8 does have gather/scatter load/store ops that might make the compiler's job harder.
 

dada_dave

Site Champ
Top Poster Of Month
Posts
895
Reaction score
890
Another interesting discussion on the cost of TSO in GB 5 vs 6:

1684644072933.png

1684644662934.png




With a bit of a joke later on wherein x86S comes with a mode to emulate ARM memory ordering:


:)
 
Top Bottom
1 2