What would be your idea? I mean, you do need to read the 4 bits for each 16 byte region to validate the pointer. Those 4 bits have to be stored somewhere. I suppose you can store them adjacent to data, but how would the CPU know where to look? It has no concept of allocated regions. To me it looks like a per-page system, possibly with some hashing mechanism, likely stored as part of the page descriptor. At least that’s how I’d design the system.
So, as I said, I need to read up on it more cause I don't fully understand the concepts as things are right now, but with the understanding I have right now:
When you make an allocation
malloc(sizeof(size_t)*4)
in addition to the memory allocator just figuring out where to give you a pointer to in the heap (moving the program break or whatever else it wants to do), it also adds a tag to all the memory it allocated - Assuming it's going to live in the last four bits of the pointer, pseudo code along the lines of this
for (address in allocation_range) {
tag = tag_value<<60
address |= tag
}
Then, at a CPU hardware level, ignore the tag bits for MMU operations, but check it and trap to the kernel if it doesn't match on a memory lookup.
The effects would be that when I grab a pointer to a list of 4 entries
int* firstElement = startOfList;
I have the tag loaded from that allocation chunk. I can now keep accessing elements with arithmetic operations and the tag remains stable
int* secondElement = firstElement+1
int* thirdElement = secondElement+2
int* fourthElement = thirdElement+1
All of these are valid pointers. But going outside the range
int* outOfBounds = fourthElement+1
I now have a pointer with a tag, going into a different memory region that may be associated with a different allocation and thus having a different tag, or not yet allocated, having no tag - in either case, the pointer arithmetic is only valid within the region where the tag matches.
But as I said, I haven't read up on this enough at all, the above is just me spinning ideas based on loose ideas of the concepts and I may even be conflating the purposes with PACs a bit here.
Regarding granularity, in the post, Apple specifically says the tagging system is used when page-level granularity is not enough. When page-level granularity is enough they often don't use the tag system at all