New ARM extensions announced, including version 2.2 of SVE and SME. Dougall seems excited, particularly about the new Compare-and-branch and FEAT_CSSC:
No, Neon sits alongside FP. FP is in the 1E encoding space (high order byte of the op word) while Neon is in the 5E space; SVE sits in the 04 space, completely separate from FP and Neon.Huh; NEON is on the list. I thought NEON was entirely superseded by SVE.
I mean, if compare-and-branch is a more efficient encoding of the pattern I can see it being a great addition. I only know how to write x86 well, but
cmp
jz
is a very common pattern in my assembly. Of course the conditional operations in x86 can avoid the branch a lot of times, like
cmp
cmovz
Not sure if AARCH64 has that already. But regardless, if the encoding is more efficient I can still see a cjmpz <compare reg1, compare reg2, jump destination> kind of instruction as useful
Could you give examples? I tried reading up on the different compare/branch semantics and hit a little lost.I think it is very interesting that they decided to add explicit compare and branch. Most modern CPUs fuse cmp+branch sequences, and ARM already included common patterns like branch on zero and conditional move. Given how expensive these instructions are in terms of the encoding space, there must be some hard data demonstrating that the previous approach was insufficient.
Could you give examples? I tried reading up on the different compare/branch semantics and hit a little lost.
I think you're thinking about this the wrong way. The amount of encoding space is set by the instruction template they chose, which uses a total of 8 bits for opcode. That leaves 24 bits to encode everything else needed to make the instruction useful.Same question about the new compare with immediate and branch — only unsigned values between 0 and 64 are supported. What is so special about these values that would justify reserving almost 0.5% of the encoding space?
I think you're thinking about this the wrong way. The amount of encoding space is set by the instruction template they chose, which uses a total of 8 bits for opcode. That leaves 24 bits to encode everything else needed to make the instruction useful.
One bit gets burned on selecting 32-bit or 64-bit comparison. Three are used for the condition code. Five are used to encode the source register. No way to economize on these - at most I think you could shave one bit off the condition code field, if you were willing to make the instruction less capable.
The remaining 15 bits must be split between two values: the immediate that's compared to the source register, and the offset to add to the program counter if the branch is taken. They chose to go with a 9-bit offset, leaving 6 bits for the immediate value.
Would you like more than 9 bits of offset? Yes, absolutely. As Dougall comments, +/- 1KiB is enough to be useful, but still feels a little tight.
Would you like more than 6 bits for the immediate? Yes, absolutely. But 6 bits does encode zero and one, which are almost always the most frequently used immediate values in nearly every context, and by a fairly wide margin. So from a certain perspective, 6 is a luxury.
I don't think you could go the other direction and keep these instructions useful - offset size does matter quite a lot. So, the question is, would it have been better to have a 5- or even 4-bit immediate to double or quadruple the offset range? I don't pretend to know, but presumably Arm based this decision on analysis. It was probably someone's project to implement compiler support for several different split options, compile a bunch of testcases (probably including the entire SPEC suite) with each, and collect data on how often each split forced the compiler to avoid emitting a compare-and-branch instruction.
Edit: that kind of analysis was also no doubt used in deciding whether these instructions were worthwhile at all.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.