M4 Mac Announcements

Because maybe what breaks the old platform is a security fix in the hypervisor, or the processor/Secure Enclave/etc..
I don't really buy that. It should be possible to virtualize any arbitrary OS, regardless of what it does.

I mean, I'm not saying you're wrong about what actually happened. I'm saying, it should be something they can fix without touching the guest OS.
 
The high mobility of threads on M4 (contrasting strongly with earlier generations), for example.
This [article] was very interesting indeed. When running only half as many threads as cores, it seems threads are periodically switched from one cluster to the other, every ~1.3s or so. He speculates later that this may be done to improve cooling, by spreading the hotspot over a larger surface area.

I'm... surprised it's worth it. Doesn't moving all the threads to another cluster like that trash cache and stuff? I know it's a relatively 'large' timespan (over a second), but it contrasts starkly with the general advice about minimizing context switches. Plus I wouldn't have thought the cores are far apart enough for this to significantly improve cooling. Obviously, if they're doing it, it must be worth it. I'm just surprised it is.
 
This [article] was very interesting indeed. When running only half as many threads as cores, it seems threads are periodically switched from one cluster to the other, every ~1.3s or so. He speculates later that this may be done to improve cooling, by spreading the hotspot over a larger surface area.

I'm... surprised it's worth it. Doesn't moving all the threads to another cluster like that trash cache and stuff? I know it's a relatively 'large' timespan (over a second), but it contrasts starkly with the general advice about minimizing context switches. Plus I wouldn't have thought the cores are far apart enough for this to significantly improve cooling. Obviously, if they're doing it, it must be worth it. I'm just surprised it is.
Surprised me too. It does kill local caches but depending on the cache topology (not actually sure if it’s an eviction cache or what) the data will at least remain in the SoC level cache (slc/L3) and over a second is like a billion years

But yeah generally context switching is expensive
 
I don't really buy that. It should be possible to virtualize any arbitrary OS, regardless of what it does.

I mean, I'm not saying you're wrong about what actually happened. I'm saying, it should be something they can fix without touching the guest OS.
Possible, if you bother to trap for the hardware unsupported instructions.

if you don't bother because you don't care (Apple), or don't know yet due to lack of documentation on the new CPU vs. old OS (Parallels, VMware perhaps)... then nope...
 
This [article] was very interesting indeed. When running only half as many threads as cores, it seems threads are periodically switched from one cluster to the other, every ~1.3s or so. He speculates later that this may be done to improve cooling, by spreading the hotspot over a larger surface area.

I'm... surprised it's worth it. Doesn't moving all the threads to another cluster like that trash cache and stuff? I know it's a relatively 'large' timespan (over a second), but it contrasts starkly with the general advice about minimizing context switches. Plus I wouldn't have thought the cores are far apart enough for this to significantly improve cooling. Obviously, if they're doing it, it must be worth it. I'm just surprised it is.
That was exactly my reaction. More about the hotspot moving a tiny bit mattering, than that it's worth paying the price in cache trashing.

Now I wonder what it looks like if all cores are loaded roughly the same, constantly - do threads still move, or are they smart enough not to bother? Or are they smarter than me, and it's still worth moving them? In that situation, I'd think the only thing making it worthwhile to move threads would be memory locality, which shouldn't be a factor except on Ultras, which don't exist in this gen so far... but maybe I'm missing something.
 
Migration like this doesn’t surprise me at all. Hot spots on silicon are very localized. Not much lateral spread unless you thin the wafer and metallize the back surface or the like.

I know of certain other chips that do this, and I was surprised Apple hasn’t done it previously.
 
generally context switching is expensive

If you do it in software. Apple processors seem to keep track of register usage: if the switch is done in hardware, the original core could be put in source mode, at which point it forwards its register usage map to the destination core, along with PC, r31 and r30, then the destination core could request registers as needed until it has satisfied the usage map spec, at which point it would signal the source core to switch to idle and invalidate its context.

For example, if an instruction is add r5, r10, r11 , the incoming core would request r10 and r11 and mark off r5, since it will not be needed. Run a hundred instructions and you could have most of the context transferred behind the scenes.

I could imagine a situation in which a core observes a shift toward Neon/SVE in concert with a nearly-empty FP/Vector register file and might find it advantageous to move the work to a different core. Theoretically, the swap could be set up to occur entirely without software intervention, under the appropriate circumstances.

It sounds a bit beyond the pale, perhaps, but given the increasing complexity of contemporary processor design, I would guess that such a pattern in not entirely unrealistic.
 
Migration like this doesn’t surprise me at all. Hot spots on silicon are very localized. Not much lateral spread unless you thin the wafer and metallize the back surface or the like.

I know of certain other chips that do this, and I was surprised Apple hasn’t done it previously.
This surprised me, since silicon's thermal conductivity is relatively high—about half-way between iron and aluminum (and the little bit of data I could find seems to indicate this also applies to the doped silicon used in chips).

Unless there's something specific to etched silicon chips that significantly reduces their thermal conductivity relative to silicon blanks (the etching causing air gaps?), I'm guessing the issue is that a lot of thermal energy is generated within a very small volume, so the surface area for outgoing heat flow is relatively small.
 
This surprised me, since silicon's thermal conductivity is relatively high—about half-way between iron and aluminum (and the little bit of data I could find seems to indicate this also applies to the doped silicon used in chips).

Unless there's something specific to etched silicon chips that significantly reduces their thermal conductivity relative to silicon blanks (the etching causing air gaps?), I'm guessing the issue is that a lot of thermal energy is generated within a very small volume, so the surface area for outgoing heat flow is relatively small.
That plus the fact that you have lots of poiysilicon around. Crystalline conducts heat better than poly. (Though heavily doped poly isn’t too bad). But neither gets anywhere near what you need to dissipate heat fast enough to compensate for the heat you are generating in dense circuits in today’s current densities. There’s just too much heat being generated and it can’t spread fast enough unless you provide very high thermal-k paths for it to do so. (Like by providing massive copper heat pillars on the top side connecting to a heat sink.)

Also, some people use SOI wafers, where there is an electrical insulator on the back side (to prevent substrate currents and allow higher circuit density because of no need for grounding big p-wells). These also tend to not conduct heat well.

Note that you really wouldn’t want to use iron as a heat sink either.
 
Last edited:
If you do it in software.
Even in software, it's a matter of scale. What's it going to cost you to flush registers to cache? A few hundred cycles? Maybe a bit more if you're going out to SLC (which I think you are if you're switching between clusters, though they could presumably do some type of core-to-core thing in the NoC if they thought it was worth it). Out of 5+ billion cycles in the 1.3 second reported period. It's not nothing, but it's just a tiny fraction of a percent, and if it lets you keep clocks up instead of lowering them even just 10% to keep heat in check, well, that's clearly a massive win.
 
Back
Top