I’m sure you could write an app to fill up memory then time the accesses, though you’d have to be sure to make sure you are reading randomly to avoid caching.
I would bet it doesn’t add too much latency. The actual memory read takes a very long time, so adding a few cycles in each memory controller gets dwarfed by that. That assumes no contention, of course, which is the thing I’d have to think about. In other words, if chip A, wants chip B to fetch something and send it back to chip A, chip A may have to wait because chip B is already busy reading memory for itself (or for chip C or D). So the question becomes - how many accesses can each chip accomplish simultaneously. That question gets more complicated because the answer is probably “it depends.” Memory is segmented into banks, and there are separate memory controllers, so if you are reading 4 addresses from different parts of memory you may have no problem, but 4 from the same part of memory may take 4 times as long.
I just don’t know enough about what Apple did here. My focus early in my career was memory hierarchies, so it’s near and dear to my heart, though.