x86 CPUs from AMD & Intel current/future releases.

I am not aware of any announcement of an actual product to use these instructions. At any rate, even if it becomes a reality it will be at least decade until these instructions are widely supported.
Yep. But I would like those additional 16GPRs for my hobbyist assembly programming :P
 
The d = s + a design is not a feature of RISC processors, though, it is a consequence. Since individual instructions can be somewhat bulky as 32-bit words each, the fixed-length coding scheme strives to make each op as maximally useful as possible, at essentially no cost in performance. Adding it to x86 is a desperate move that ends up making the coding scheme more complex and does not really gain much code density, if any.
 
The d = s + a design is not a feature of RISC processors, though, it is a consequence. Since individual instructions can be somewhat bulky as 32-bit words each, the fixed-length coding scheme strives to make each op as maximally useful as possible, at essentially no cost in performance. Adding it to x86 is a desperate move that ends up making the coding scheme more complex and does not really gain much code density, if any.
I may have lost track of the discussion a bit and not sure exactly what in the thread it relates to. What do you mean with d = s + a - I assume d is destination, s could be source - a just any operand? Do you just mean the ability to assign the evaluation of an expression (s+a) to any destination register? Unlike the pattern of, for example, add rax, r8, where rax is both an argument in the calculation and the destination register? (or flipped order or with % in front or whatever depending on your assembler syntax preferences)
 
The d = s + a design is not a feature of RISC processors, though, it is a consequence. Since individual instructions can be somewhat bulky as 32-bit words each, the fixed-length coding scheme strives to make each op as maximally useful as possible, at essentially no cost in performance. Adding it to x86 is a desperate move that ends up making the coding scheme more complex and does not really gain much code density, if any.

The way I understand it, non-destructive operations are about performance rather than code density. Fewer moves, fewer instructions, could help improve the IPC.

I may have lost track of the discussion a bit and not sure exactly what in the thread it relates to. What do you mean with d = s + a - I assume d is destination, s could be source - a just any operand?

Non-destructive operation (destination can be different from the sources).
 
The way I understand it, non-destructive operations are about performance rather than code density. Fewer moves, fewer instructions, could help improve the IPC.
Yes. Accumulator or stack-style instructions require fewer instruction bits because you implicitly know one of the operands just from the opcode. This was a big deal back before instruction caches, and when memory was a limited resource. When you guess wrong on a branch or have some other pipeline hiccup, these ISAs also have an advantage in that it’s cheaper to flush the registers (because there are far fewer of them). And implementation is simpler because you have far fewer registers to keep track of.

Disadvantage, of course, are that you may (almost certainly) need mov’s or load/stores to keep all your variables around. In the real world there have been a million studies of this, and more registers are better (up to a point of diminishing returns). And, particularly in RISC, where you don’t have a lot of multi-step instructions with intermediate values that use non-architectural registers, so you are storing intermediate values of calculations more often in architectural registers, you want more registers.

SPARC, with its register windows, was maybe the most interesting ISA with respect to playing with how far you want to go with register count.
 
IA-64 looked at SPARC register windows and said "we all know this feature didn't work out, so hold my beer, I'm gonna double down on it".

IA-64 also took predication (i.e., the conditional execution of almost any instruction, not just branches) that only worked with hand-coded ARM assembly, but not with compilers.
There are likely more examples, where they picked something for IA-64 that didn't work well with compilers and designed it in a way that the compiler would be responsible for everything, including instruction scheduling.

Someone claimed that side-channel attacks would be much hardware in IA-64, because it apparently had a lot of security features. But I never dove into it that deeply to by able to confirm or deny that claim.
 
I dunno about that take! The IA-64 ISA was designed in the late 1990s, which was long before modern threat models had developed. I haven't looked into its security features either, but would not be shocked if they're naive and dated by today's standards.

Also probably overcomplicated. That's a theme whenever I've looked at any IA-64 feature in depth. The rotating register files are a case in point. It has 128 (yes 128!) GPRs and FPRs. The first 32 are "static", the remaining 96 participate in rotation. Rotation can happen at a sort of global level for the procedure call stack, and there's also instructions to locally allocate chunks of registers for manual rotation within a local loop. There's even a Register Stack Engine to automatically spill and fill as procedure calls and returns rotate the register file.

The Itanium architects claimed they were making hardware simpler and faster by punting things they thought were hard (like instruction scheduling) to the compiler. But somehow this also led them down the path of doing all kinds of other things in hardware to "assist" the compiler. I think this added all the hardware complexity back in, and then some. Itaniums were never small, simple chips. Instead, they were always huge, hot, low-clocked, and usually very late to market because they were so difficult to design and debug.
 
In spite of it not panning out I must admit I like the theoretical idea of the register window/rotation thing. Avoiding push/pops for all the call-clobbered registers every function call seems pretty valuable. Of course for code in a single compilation module or with LTO the compiler can also reduce a lot of that when it knows that, for example, the callee doesn’t touch R14 or whatever. But there’s still a lot of push pop that could be avoided.
 
In spite of it not panning out I must admit I like the theoretical idea of the register window/rotation thing. Avoiding push/pops for all the call-clobbered registers every function call seems pretty valuable. Of course for code in a single compilation module or with LTO the compiler can also reduce a lot of that when it knows that, for example, the callee doesn’t touch R14 or whatever. But there’s still a lot of push pop that could be avoided.

Are you familiar with the https://millcomputing.com/ project? They have some interesting ideas, at least on paper.
 
These are just sketchy Internet rumors as far as I can tell, only given given slightly more credence because of Intel’s struggles, but Arrow Lake Refresh and the Beast Lake part of Royal Core (but not earlier Nova Lake) might be cancelled:

 
These are just sketchy Internet rumors as far as I can tell, only given given slightly more credence because of Intel’s struggles, but Arrow Lake Refresh and the Beast Lake part of Royal Core (but not earlier Nova Lake) might be cancelled:


Eighteen. It took eighteen designers to invent x86-64 and design Opteron.

Instead of cancelling projects, Intel should lay off most of its designers and make the rest work their ass off until things rebound. (And they should be working their ass off competing with qualcomm and nVidia, not AMD, at this point)
 
Eighteen. It took eighteen designers to invent x86-64 and design Opteron.

Instead of cancelling projects, Intel should lay off most of its designers and make the rest work their ass off until things rebound. (And they should be working their ass off competing with qualcomm and nVidia, not AMD, at this point)
For the Arrow Lake Refresh cancellation, I have no basis for judging if the people they cite have a decent reputation for having inside sources (they might!) since historically I haven’t paid much attention to the rumor mill on the PC side and I almost didn’t post it because of that but since a notebookcheck picked it up I thought maybe it’s at least worth noting.

The Royal Core cancellation comes from MILD who I have heard of and see … varying opinions of but it also seems generally agreed that he may actually have sources:


And according to him apparently the Royal Core cancellation has to do with Gelsinger not liking the project and canning it because Intel won’t need a high performance CPU core in the future.

I have to admit the rentable unit idea as laid out here sounds kind of neat if it worked:

According to MLID’s description of Rentable Units, each RU module could act as both a single big core and be split into smaller cores when needed. So, with two threads per RU module, the big P-cores in Beast Lake could be split into two smaller cores each when needed.
Interestingly, the description is quite similar to what RedGamingTech revealed back in early 2023 where the leaker reported that the Royal Core project could do away with individual P and E cores in favor of tiles. These tiles were said to have the ability to operate in both “Cove” and “Mont” modes to simulate P and E cores respectively.
And according to the leak, it was working but Gelsinger killed it anyway:
So, the reason Beast Lake and Beast Lake Next are allegedly canned is not because the Royal Core project was having development troubles. But because, Intel’s CEO doesn’t think that the company needs high-performance cores when, per MLID’s source, CPU cores are only going to be used to connect GPUs. Essentially, Pat Gelsinger is betting on AI/server chips and not on high-performance cores for desktop clients.

Again, who knows right? 🤷‍♂️ All of this from the description to the cancellation to the reason behind the cancellation could all be BS. But it’s the only piece of information we have and at least the description of rentable units sort of makes sense with what had been leaked before, maybe independently (maybe not if it’s the same source simply giving the same info to two people).
 
For the Arrow Lake Refresh cancellation, I have no basis for judging if the people they cite have a decent reputation for having inside sources (they might!) since historically I haven’t paid much attention to the rumor mill on the PC side and I almost didn’t post it because of that but since a notebookcheck picked it up I thought maybe it’s at least worth noting.

The Royal Core cancellation comes from MILD who I have heard of and see … varying opinions of but it also seems generally agreed that he may actually have sources:


And according to him apparently the Royal Core cancellation has to do with Gelsinger not liking the project and canning it because Intel won’t need a high performance CPU core in the future.

I have to admit the rentable unit idea as laid out here sounds kind of neat if it worked:



And according to the leak, it was working but Gelsinger killed it anyway:


Again, who knows right? 🤷‍♂️ All of this from the description to the cancellation to the reason behind the cancellation could all be BS. But it’s the only piece of information we have and at least the description of rentable units sort of makes sense with what had been leaked before, maybe independently (maybe not if it’s the same source simply giving the same info to two people).

yeah, who needs high performance cores, right? It’s not like single core performance makes a difference. I mean, that’s why Apple has achieved so much success, focused solely on multi core … wait…

Scratch all that.

BTW, this sounds like a very Gelsinger thing to do. “we don’t actually have an advantage in GPUs, so let’s focus on making our CPUs shittier.”

BTW, I was thinking about this the other day given the Qualcomm rumors. Instead of qualcomm, I think it would be more fun if nVidia bought Intel. Intel would have access to good GPUs, and nVidia would be able to compete with qualcomm’s CPU cores.

Spin off the fabs and sell them to global foundries.
 
I took a look, and to paraphrase Mike Tyson, everyone’s got a computer architecture, until they get punched in the nose (with solid state physics)
So far, Mill has been a paper tiger. Long on grandiose claims, short on demonstrating them. Seems to basically be a shoestring budget operation that's hoping to attract VC money.
 
Back
Top