As I recall, conditional branches specified a CR number (0-7) but most operations that set flags used CR0, so the other 6 CRs were kind of vestigial, in that the could had to manually move them around. It was a pretty silly feature.
IIRC, CR0 was implicitly set by integer intructions, same for CR1 with floating point, while the remaining could be selected explictily by comparison instructions.
The branch instructions could then select one field to check against.
I guess the thought was that a single condition register could become a bottle-neck for super-scalar implementations.
I think the MIPS solution to eliminate the confition register and use GPRs instead is more elegant. Although they introduced one for the FPU later one. But Alpha fully eliminated the condition register.
I guess nowadays a single condition register isn't an issue anymore due to register renaming. I'm sure Cmaier will correct me if I'm wrong.
I believe I never thought of the 6502 having a pipeline, but it certainly used much less clock cycles per instruction (2 to 7, I think, with Z80 and 68000 needing at least 4 for the most primitive instruction).
According to some of the stories I've read, the good latency of the 6502 is one of the reasons that ARM exists today. When Acorn wanted to design a successor to their BBC Micro (which featured a 6502), nothing fit the latency they were used to, so when they came across Berkeley-RISC (which later became SPARC) and Standford-MIPS (which later became MIPS), they decided that if two universities could design their own RISC CPU, they could too.
Talking of pipelines, looking back now it's amusing that the MIPS R4000 was coined to have a "super-pipeline", because it had a whopping number of 8 stages. I wonder what they'd call a pipeline with over 20 stages...
Regarding Transmeta vs Itanium:
First of all, I'm not sure if VLIW was a good idea to begin with. Is any of the current architectures using it?
I might be wrong, but I think one of the initial issues with Itanium was that it isn't simply VLIW, but EPIC (Explicitly Parallel Instruction Computing). The "explicitly" here means that the compiler tells the CPU exactly how to execute the instructions, so first versions of Itanium did not contain a hardware scheduler (which brings us back to one of Cmaier's topics).
I always thought that this was strange, because if they would have a more super-scalar CPU in the future, the software would have to be recompiled to actually fully use it.
The other issue was that none of the templates for the 3-instruction bundles features more than one floating point instruction, IIRC. For floating-point-heavy code you might have a sequence of 128-bit bundles that only contain one 41-bit FP instruction and two NOPs.
What always irked my about AMD64 (x86-64) was the fact that it kept the 2-address-structure of x86. And legacy stuff, like variable shifts still using register CL implicitly.
But I guess Microsoft might be partly to blame for that, given one of Cmaier's comments above.