Intel sure did you guys a solid on that one!And we had to keep our fingers crossed that Itanium would suck.
Intel sure did you guys a solid on that one!And we had to keep our fingers crossed that Itanium would suck.
Intel sure did you guys a solid on that one!
While x86 definitely started as an accumulator-style architecture, and some parts still have the implicit use of (E)AX, I would call your definition a 2-address-architecture.As was mentioned by others here earlier, x86 is an “accumulator”-style architecture. This means that when you want to do a math or logic operation, the destination register is the same as one of the source registers.
This Bob Colwell interview is mostly not about Itanium, but I always found the parts which are quite enlightening.They certainly had a “build and they will come” attitude. It was a weird design process, anyway - it was almost like nobody at Intel had any ideas so they just let HP’s PA-RISC folks to roam the halls and build a science project.
It really was, though. The way you can tell is that the 68000, like the x86, has an explicit register-to-register move opcode that does nothing but move the contents of a register to another register. I have done machine coding on a 68000 machine and on a 80186 machine and the move opcode sees a fair amount of use, which ends up being a wasted cycle, really. On a three-operand architecture, you almost never need to use a register move, because you can do the calculation and put the result exactly where it needs to be. With a large register set, the three-operand architecture is immensely more efficient. There is a register-to-register move instruction, but it is a pseudo-op that is a rewording of OR Rd, Rs, Rs.According to your definition the MC68000 would be an accumulator-style architecture as well, which I believe is not the common specification.
Well I won’t argue about what we call it as long as we agree on what it is doing. I’m not aware of any industry agreement on the words we use for such things.While x86 definitely started as an accumulator-style architecture, and some parts still have the implicit use of (E)AX, I would call your definition a 2-address-architecture.
According to your definition the MC68000 would be an accumulator-style architecture as well, which I believe is not the common specification.
Of course accumulator architectures also always overwrite one of the operands with the result, because they typically have just one (6502) or two (68xx) accumulator registers.
Having interviewed their in 1991 or 2 and received an offer, I can tell you their culture was pretty bad before itanium.This Bob Colwell interview is mostly not about Itanium, but I always found the parts which are quite enlightening.
(PDF) Colwell oral history complete transcript FINAL - PDFSLIDE.NET
Robert P. Colwell oral history 1 of 164 Oral history of Robert P. Colwell (1954- ) Interviewed by Paul N. Edwards, Assoc. Prof., University of Michigan School of Information,…pdfslide.net
Seems like folks on the x86 side had plenty of ideas, and tried to warn management that the claims being made by Itanium people were dangerously unrealistic, but after Andy Grove stepped down as CEO, Intel's senior management and company culture took a turn for the worse.
Of course, Apple uses the large ARM register file to pass arguments between subroutines where x86 typically uses the stack. Using registers instead of the stack reduces memory access overhead (an especially big issue when you have 10 or 24 processor cores fighting over who gets to use the memory bus right now. L1 caches do help with that, but using registers is massively more efficient.
Not to get into the weeds too much, but one of the things Apple did with the larger register set in AMD64 was to move arguments off the stack and into registers, meaning the first 6 arguments were passed by register, while the return pointer remained on the stack.
Technical Note TN2124: Mac OS X Debugging Magic
TN2124: describes a large collection of Mac OS X debugging hints and tips.developer.apple.com
32-bit ARM in comparison stored the first 4 arguments by register, and I expect 64-bit expanded this by a bit. It’s been a while so I don’t remember the exact number of arguments passed by register, and my bookmarks don’t have details on it either sadly. But I think the point here is that AMD64 is closer to ARM argument passing semantics than x86 when it comes to Apple platforms.
Technical Note TN2239: iOS Debugging Magic
TN2239: describes a large collection of iOS debugging hints and tips.developer.apple.com
I need to track down updated versions of these documents if they exist, since neither seems to be fully up to date anymore, but they were invaluable references when I was having to debug without dSYMs (or I had dSYMs but they weren’t loading properly in Xcode for various reasons) from time to time on my old team.
Well, x86-64 has the same number of registers as ARM32, so there would be no practical reason to not pass as many arguments in registers. Of course, ARM32 has 2 architecture-dedicated registers (any register could, in theory, serve as SP) where x86-64 has just one, practically speaking. But using registers to pass arguments started on PPC, so the profile for ARM64 should be essentially the same.Not to get into the weeds too much, but one of the things Apple did with the larger register set in AMD64 was to move arguments off the stack and into registers, meaning the first 6 arguments were passed by register, while the return pointer remained on the stack.
Technical Note TN2124: Mac OS X Debugging Magic
TN2124: describes a large collection of Mac OS X debugging hints and tips.developer.apple.com
32-bit ARM in comparison stored the first 4 arguments by register, and I expect 64-bit expanded this by a bit. It’s been a while so I don’t remember the exact number of arguments passed by register, and my bookmarks don’t have details on it either sadly. But I think the point here is that AMD64 is closer to ARM argument passing semantics than x86 when it comes to Apple platforms.
Technical Note TN2239: iOS Debugging Magic
TN2239: describes a large collection of iOS debugging hints and tips.developer.apple.com
I need to track down updated versions of these documents if they exist, since neither seems to be fully up to date anymore, but they were invaluable references when I was having to debug without dSYMs (or I had dSYMs but they weren’t loading properly in Xcode for various reasons) from time to time on my old team.
Well, x86-64 has the same number of registers as ARM32, so there would be no practical reason to not pass as many arguments in registers. Of course, ARM32 has 2 architecture-dedicated registers (any register could, in theory, serve as SP) where x86-64 has just one, practically speaking. But using registers to pass arguments started on PPC, so the profile for ARM64 should be essentially the same.
I do understand the use of the call stack for transient variables, but it seems like a questionable practice in an architecture that uses true GPRs. If I were creating an OS, the stack would be no more that 32Kb and all the variables in memory would go in a separate area, just for safety.
I'm surprised ARM32 had fewer arguments passed via register. Figured it should be the same.Well, x86-64 has the same number of registers as ARM32, so there would be no practical reason to not pass as many arguments in registers. Of course, ARM32 has 2 architecture-dedicated registers (any register could, in theory, serve as SP) where x86-64 has just one, practically speaking. But using registers to pass arguments started on PPC, so the profile for ARM64 should be essentially the same.
I do understand the use of the call stack for transient variables, but it seems like a questionable practice in an architecture that uses true GPRs. If I were creating an OS, the stack would be no more that 32Kb and all the variables in memory would go in a separate area, just for safety.
Thanks. Came from "the other place", just under a different pseudonym.Hey, welcome to the site!
On one thread on the programming forum several years ago, this poster who worked at NASA was perplexed that some routine he wrote worked if he ran it on the main thread but not if he ran it in a secondary thread. Turns out he had this very large array as a temp, and it was too big for the stack unless it was the main thread, because the main thread has something like 8Mb of stack space but other threads are given half a Mb.In one of the larger projects I've worked on, data locality was important enough that keeping stuff on the stack was preferred to heap objects, and we'd have stack frames that by themselves would be upwards of a kilobyte. I suppose you could split the stack up to protect the return pointers at least without giving up the data locality.
Quoting the Wikipedia article on "Accumulator (computing)":It really was, though. The way you can tell is that the 68000, like the x86, has an explicit register-to-register move opcode that does nothing but move the contents of a register to another register.
According to these definitions 68K is a 2-address GPR architecture (not exactly, since there is data and address registers). Unless I'm mistaken, there are no 68K instructions with implicit registers; even division and multiplication explicitly include all operands.
Well, I might be overly pedantic. The move instruction in both x86 and 68k are simply modes of load/store operations that use a register as the operand rather than a memory address. And the move operation on ARM is a shift operation with a shift count of zero (I look at that backwards, not that it is a move that has a shift count but that it is an immediate shift with a possible separate Rd – as I recall, x86 and 68k did shifts on a register with the result always in the same register, unless x86-64 changed that).Also an explicit MOVE instruction is not really a good argument, since AArch32 has dedicated move instruction. One reason is of course that you need it to execute shifts and rotations without other operations, since they only work on the second operand and there are no separate instructions. But you I'm sure you wouldn't call it an accumulator architecture, because it has a dedicated move instruction.
I guess there are some other memory-ordering wrinkles, too. I hesitate because I like to simplify things and we’re getting into some intricacies now.HT vs DMB and the performance gains that each has to offer.
Let me give it a shot and you can clean up my mistakes.I guess there are some other memory-ordering wrinkles, too. I hesitate because I like to simplify things and we’re getting into some intricacies now.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.