Intel’s new APX/AVX10 extensions

So basically your code won't run on any older machine. Forced hardware upgrades = may as well review which platform to go with...
 
So basically your code won't run on any older machine. Forced hardware upgrades = may as well review which platform to go with...
Knowing Intel, it will be like it was with ARMv8: you will be able to run Athlon-type code in compatibility mode; you will be able to run 80386-type code directly in compatibility mode; you will be able to run 8086-type code in another compatibility mode.

It will be awesome (/s).

Additionally... x86 on ARM works pretty well these days.

Well, x86 can be translated to Apple Silicon to run very well, because of AS features designed to assist with that. Vanilla Cortex-X does not translate it quite as smoothly, lacking those enhancements.
 
So basically your code won't run on any older machine. Forced hardware upgrades = may as well review which platform to go with...

This has already happened a number of times (SSE, AVX…) But every time it took years until new instructions would really become widely used. ISA upgrades in x86 world happens from the perspective of backward compatibility: software only uses new instructions if one can be certain that virtually every customer has a CPU supporting it. AVX for example was available since 2011 and software still doesn’t assume that it’s there. So if Intel ships these new instructions with new Xeons in 2024 and new consumer chips in 2024/2025 they won’t really become mainstream until at least 2025…

And with vector stuff like AVX software could at least compile different versions of functions and use run-time dispatch: code that uses these instructions tends to be compact and well localized. With APX you can’t really do that aside shipping multiple versions of your binary and invoking the correct one at application startup.
 
Last edited:
Well, x86 can be translated to Apple Silicon to run very well, because of AS features designed to assist with that. Vanilla Cortex-X does not translate it quite as smoothly, lacking those enhancements.

While the additional hardware features Apple added certainly help with emulation (optional TSO ordering + certain flags), a lot of people working on the platform (eg the Asahi folks) have said straight up that the main reason AS is so good at x86 emulation is because the techniques for (esp static) emulation have gotten so good and that the M-series processors are so performant in general. They are just damn good processors and would be good at this even without the specialized additions Apple put in there (though again those additions do help, they aren’t a waste or anything).

The first reason, the progress of emulation in general, is why the MS emulator is also deemed pretty good now (despite running on far weaker Qualcomm hardware) and the Asahi team was able to choose between multiple options to best emulate x86 for their purposes and went with FEX rather than Rosetta 2 (again, best for their purposes).
 
This has already happened a number of times (SSE, AVX…) But every time it took years until new instructions would really become widely used.
Yeah I guess that's my point. If they release these instructions today, it will be 3-5 years before they can be used widely for fear of cutting off compatibility with the existing fleet of machines. It's not going to be a quick catch up for intel.

Intel may spin it as "merely an incremental change to the instruction decoder", and for intel maybe it's that "simple". But the reality is that it won't make any real world difference outside of niche applications for at least 3-5 years as the existing fleet of machines need to be supported by any software that may otherwise make use of this.

Of course the application vendor could deploy two binaries, one compiled for this and one not... but that's more support burden.

Never mind all of the compiler vendors needing to update their development tools for this first as well.
 
Yeah I guess that's my point. If they release these instructions today, it will be 3-5 years before they can be used widely for fear of cutting off compatibility with the existing fleet of machines. It's not going to be a quick catch up for intel.

Intel may spin it as "merely an incremental change to the instruction decoder", and for intel maybe it's that "simple". But the reality is that it won't make any real world difference outside of niche applications for at least 3-5 years as the existing fleet of machines need to be supported by any software that may otherwise make use of this.

Of course the application vendor could deploy two binaries, one compiled for this and one not... but that's more support burden.

Never mind all of the compiler vendors needing to update their development tools for this first as well.

If Intel is willing to transition its customer base to something new in 3-5 years, they picked the wrong new thing to transition to.
 
If Intel is willing to transition its customer base to something new in 3-5 years, they picked the wrong new thing to transition to.

There is a reason why Hennessy & Patterson called x86 the "golden handcuffs".
Yes, it makes Intel a lot of money (still), but it also shackles them to this almost antique ISA.
 
There is a reason why Hennessy & Patterson called x86 the "golden handcuffs".
Yes, it makes Intel a lot of money (still), but it also shackles them to this almost antique ISA.
They should have remembered Apple’s rule: you’re going to get disrupted, so better to do it to yourself.
 
I looked at the base 80386 instruction set with an eye toward how they could have built a practical 16-bit-opcode instruction set for the 32-bit architecture while maintaining an energy-efficient 8086 compatibility layer. It looks like they could have transitioned without too much difficulty, included a bit pattern that the fetcher could use to determine how long a variable-length op would be, eliminated all prefixes and segment nonsense as well as the string direction flag, and expanded to a 24 register set.

It would have taken considerable effort to redesign the ISA, but it would have put them in a very good position. No one runs 8086 code on newer systems, so it would have been just the right time to transition. But Intel is staid in their ways and only interested in preserving their crap design forever. If you step into their building, check your foresight at front door security.
 
I looked at the base 80386 instruction set with an eye toward how they could have built a practical 16-bit-opcode instruction set for the 32-bit architecture while maintaining an energy-efficient 8086 compatibility layer. It looks like they could have transitioned without too much difficulty, included a bit pattern that the fetcher could use to determine how long a variable-length op would be, eliminated all prefixes and segment nonsense as well as the string direction flag, and expanded to a 24 register set.

It would have taken considerable effort to redesign the ISA, but it would have put them in a very good position. No one runs 8086 code on newer systems, so it would have been just the right time to transition. But Intel is staid in their ways and only interested in preserving their crap design forever. If you step into their building, check your foresight at front door security.
Aye but we all know that the current situation is really the fault of those ne’er-do-wells who designed the x86-64 - not only effectively keeping x86 alive, but allowing it to grow to new heights. I mean fuck those guys! Amiright? 😬😉
 
Last edited:
I mean fuck those guys! Amiright?

Again, I looked at how x86 could be converted to a 16-bit ISA by embedding the 8-bit opcode byte into 16-bit opcode words with bits that explicitly signal opcode length. Kind of a weird obsession of mine, designing ISAs. In the end, the x86 ISA is inherentlly a mess, which worked ok-ish for the 8086 but really needed to be shed by the 80386. The design decisions were understandable, but wrong.

AArch64 has an organic simplicity that harkens back to AArch32. It simply does its thing simply. x86 will never be able to match the elegance of ARM. They seem to be able to keep pace with the performance, but for how long? Even MIPS RISC-V is posing a serious hazard to them. At some point, backward compatibility will simply not matter anymore, and Intel will have no cards left in their hand.
 
Again, I looked at how x86 could be converted to a 16-bit ISA by embedding the 8-bit opcode byte into 16-bit opcode words with bits that explicitly signal opcode length. Kind of a weird obsession of mine, designing ISAs. In the end, the x86 ISA is inherentlly a mess, which worked ok-ish for the 8086 but really needed to be shed by the 80386. The design decisions were understandable, but wrong.

AArch64 has an organic simplicity that harkens back to AArch32. It simply does its thing simply. x86 will never be able to match the elegance of ARM. They seem to be able to keep pace with the performance, but for how long? Even MIPS RISC-V is posing a serious hazard to them. At some point, backward compatibility will simply not matter anymore, and Intel will have no cards left in their hand.
Sorry I was just joking around, an opportunity to rib @Cmaier about how the design he and his colleagues came up for AMD64 with was so successful that it not only allowed x86 to survive in the consumer market, it helped lead to its near dominance across computing, including servers where it previously hadn’t been dominant or even all that viable.

It wasn’t really a response to your previous post, I just used it as a springboard to make a little joke*. I apologize if I wasn’t clear on that point.

*
1690558599873.gif
 
Sorry I was just joking around, an opportunity to rib @Cmaier about how the design he and his colleagues came up for AMD64 with was so successful that it not only allowed x86 to survive in the consumer market, it helped lead to its near dominance across computing, including servers where it previously hadn’t been dominant or even all that viable.

It wasn’t really a response to your previous post, I just used it as a springboard to make a little joke*. I apologize if I wasn’t clear on that point.

Remarkably enough, there wasn't really time or capacity for extreme forethought with that design. Not sure when Fred Weber came up with the general idea, but it couldn't have been long before we had the famous meeting here (https://www.lepapillon.com/) and 15 or so of us just started designing stuff.
 
I did pick up on that, although I was under the impression thah it was mostly not his design concept, he just drew the pictures to make it work.

I had a rather eclectic role on that stuff. I determined what the 64-bit integer instructions would be, and how many cycles each should take. At first we had no architect, so the design started out bottom-up. I did the initial design for the integer ALUs and the instruction scheduler. (For awhile I was responsible for the floating point, but moved to the scheduler). I was one of 3 or 4 people who worked on “globals,” which is things like floorplanning the chip, determining the standard cell architecture (we created our own cells), the metal grid (things like how power and ground were to be distributed, rules for routing wires, etc.), and things like the clock grid, clock gating technique, etc.

Pretty soon I was responsible for “methodology” (along with my friend Cheryl), which meant I handed off most of my design responsibilities and put together the tools everyone else used to design the chip.

It was very similar to a start-up - I had my hands in everything from circuit design to physical design (the polygons) all the way up to the instruction set architecture. Whatever needed to be done. Our team was ridiculously small.
 
Back
Top