I am not sure I've ever understood associativity honestly. I've looked up explanations a few times, felt like I've understood it and then none of it has ever properly stuck. None of the explanations I've ever read have ever made me properly get it and not just temporarily understand it.
I was going to tell you about how a very handsome man once wrote his PhD dissertation on caches and discussed associativity in detail with pictures and stuff, but when I went looking for the link, I realized that sometime in the last few months my alma mater finally took down my website. It had been there since 1996! Bastards.
Broke some links in some of the articles I wrote here, too.
Anyway…
Associativity is all about *where* data corresponding to a given address is allowed to be placed in the cache.
The cache is smaller than the memory it is caching (obviously). So it’s not a 1:1 mapping. If I want to cache the contents of memory address 0001, let’s say, where do I put that in the cache?
In a “direct mapped” cache, there is only one possible place to put it. I may have very simple circuitry that says anything that matches the pattern 000? goes in the first row of my cache.”. So 0001, 0009, 000F, all go in the first row. The first row can only hold one of them at a time, so if I’m storing 0001, and there’s a memory access to 0003, then I have to discard 0001.
In a 2-way set associative cache, each row of the cache can store the contents of two different memory addresses. So, in the first row I could store both 0001 and 0003. This complicates the circuitry and slows the cache, because if I already have 0001 and 0003, and there’s a memory access to 000F, I have to decide which of 0001 and 0003 I am going to discard (I usually discard the one that is “Least Recenty Used,” so I need to keep track of that). I also have to have some way to find where, in each set, each address is stored - so I store some of the address bits along with each entry. This is called the “tag.”. I have to do that with a direct mapped cache, too, but here I have to do two comparisons - one to find the correct row of the cache, and one to find the correct set).
An 8-way set associative cache works the same way, allowing you to store the contents of 8 addresses that otherwise would collide, all in the same cache row.
In a *fully associative* cache, the contents of any memory address can go anywhere in the cache. These are fun to model, but are rarely seen in the wild.
Typically you are using an N-way set associative cache, where N>1. You use the rightmost address bits to determine which row of the cache to use, then use the next set of bits as the tag, to keep track of what is in each position within the set. (You ignore the very rightmost bits, because those are bits within a word, and you are typically addressing words, not bits).
so:
[ignored][tag][row][0000]
or whatever (depends on your addressable data chunk size)