- Joined
- Sep 26, 2021
- Posts
- 6,711
- Main Camera
- Sony
Finally a non-Apple post by me 
So it looks like Tensor has 3 levels of core, with the most performative being a pair of Cortex-X1’s? That puts them behind A14 (Firestorm), let alone A15 (Avalanche). 8-wide MOP dispatch instead of 7 would seem to be the only advantage, but X1 only has 4 ALUs (plus 4, instead of 3, FP), so unlikely it will dispatch 8 very often. Not to mention that the fetch is only 5-wide anyway (or 8 MOPS), so it’s likely tough to keep the register renamer/scheduler busy unless the instruction stream is very “complex.“
Major cache differences, too, but I would tend to think apple’s use of a large shared cache is more efficient than X1’s smaller dedicated caches (since some cores may be running very memory intensive threads and others may not. Unlikely that they are all pegging memory at the same time).
Also curious whether Google did their own physical design, or just used hard IP from Arm.

So it looks like Tensor has 3 levels of core, with the most performative being a pair of Cortex-X1’s? That puts them behind A14 (Firestorm), let alone A15 (Avalanche). 8-wide MOP dispatch instead of 7 would seem to be the only advantage, but X1 only has 4 ALUs (plus 4, instead of 3, FP), so unlikely it will dispatch 8 very often. Not to mention that the fetch is only 5-wide anyway (or 8 MOPS), so it’s likely tough to keep the register renamer/scheduler busy unless the instruction stream is very “complex.“
Major cache differences, too, but I would tend to think apple’s use of a large shared cache is more efficient than X1’s smaller dedicated caches (since some cores may be running very memory intensive threads and others may not. Unlikely that they are all pegging memory at the same time).
Also curious whether Google did their own physical design, or just used hard IP from Arm.
Last edited: