I've been busy, thus I have two pages of new posts to cover. Instead of quoting, I'll use "headings" instead...
Pipeline Stalls and Optimization
I remember the official P6 Optimization Manual mentioning that any instruction longer than 7 bytes would cause a pipeline stall. Since that has been a long time ago, I'm guessing it's no longer that severe.
An interesting matter is also that a lot of the instructions that were used for speed, are somewhat slow nowadays. Take these for example:
ADD EAX, 1
INC EAX
In theory they do the same thing, but INC used to be faster, because on older CPUs every byte counted. Not only because of RAM restrictions, but because as a rule of thumb every byte took another cycle to process.
In practice they are not the same, because INC sets the condition flags slightly differently. The effect is that the general instruction (ADD) is implemented in an optimized form, while the more specific instruction (INC) is often implemented in microcode.
I'll leave it to Cliff or others to correct me, because my knowledge might be totally outdated.
Wide Microarchitectures
The wide microarchitectures might be somewhat newer in the ARM world, but they have been used before in the Apple world.
NetBurst (the Pentium 4 architecture) went deep, i.e. it had a very deep pipeline to reduce the complexity of each step and thus have the option to increase the overall clock. I guess Intel thought they could go up to 10 GHz, but then they hit a wall somewhere between 4 and 5 GHz.
The G4 (PowerPC 74xx) on the other hand was very wide instead. I think it was even wider than the G5 by comparison (the G5 had fewer AltiVec units, IIRC).
RISC and Designs
I think David Patterson made a small error with the RISC arcronym, because a lot of people expect a compact instruction set when they hear that it stands for "Reduced Instruction Set Computer".
The author of the book Optimizing PowerPC Code wrote that "Reduced Instruction Set Complexity" might be a better explanation for the acronym.
I think for teaching the basic MIPS instruction set is much easier to understand than x86, so that's definitely a plus. One could argue, if RISC-V might be a better pick, since it is newer and open source, though.
As for designs, no matter what any x86 fan might tell you: RISC has won, because any up-to-date x86 processor is using a RISC-like implementation internally, otherwise they would not be able to compete in terms of performance.
Hyperthreading
Didn't DEC experiment with SMT on the Alpha in the end?
Talking of Alpha, I remembered that they predicted a 1000-fold speed increase: 10 times by increasing the clock, 10 times through super-scalar architecture, and 10 times through CPU clusters.
Alder Lake
When Intel announced it as Core and Atom hybrid, I knew it was a hack and I was surprised that they actually produced it. The big.LITTLE concept only makes sense, when the ISAs of the different cores are the same. But I guess Intel got desparate and and they didn't have to time to design a proper E-core.
Nothing against Atom, I believe the CPUs are better than their reputation, and they are probably not pushed by Intel, otherwise they might poach some of the more lucrative Core market. But the idea behind Alder Lake is a bit of a joke. I'm surprised it runs as good as it does. Makes one hell of a heater, though.