CPU Design: Part 1 - Architecture

”Architecture” may mean different things at different companies, but at the companies I worked at it had a fairly consistent meaning. “Architecture” should be distinguished from “instruction set architecture” (ISA). The vast majority of the time, when we are designing a CPU, the instruction set architecture is largely already defined. It’s rare that CPU designers get an opportunity to even add a new instruction or two, let alone design an entire instruction set from scratch. AMD64 (now called x86-64) is an example of a new ISA I had the privilege to work on.

Architecture, on the other hand, assumes that the ISA is largely set. The job of the architect is to figure out, at a high level, how to implement the instruction set. This results in the “microarchitecture” of the design. The microarchitecture includes things like:

  • How is the CPU partitioned into high-level functional units
  • How many, and what type, of cores
  • How many ALUs (pipelines) in each core, and how many stages in each pipeline
  • What functional units are present
These design choices are frequently represented on whiteboards and notepads as block diagrams that show how the logic might be partitioned, and the major data paths and control signal flow through the design.

1674255846736.png
1674255821968.gif
Block Diagram for the Exponential x704 PowerPC
Block Diagram for the F-RISC/G GaAs Multi-chip Processor​

The architect typically writes a software model for the design in a high-level modeling language. Verilog and VHDL are traditional examples, but I’ve used C, C++, and extensions to those languages as well.

1674256013405.png
Sample Behavioral Verilog Code (cite: https://www.chipverify.com/verilog/verilog-tutorial)​

The model is a software program that can be executed to simulate the microprocessor. Sometimes there are two models - one model that is optimized for speed, and another model that is “cycle accurate.” The latter includes many more details of the structure of the microprocessor, and breaks computations into smaller steps. For example, the “fast” model may implement a multiplier like this:

Result := operandA * operandB

The cycle-accurate model may implement the multiplications with a Wallace tree that takes multiple cycles.

Sometimes the model is ”timing accurate” but not “cycle accurate.” Instead of implementing the details of what gets done in each clock cycle, the model uses simple logic to perform the calculation, but accurately reflects how many cycles it takes to do so.

The point of these models is three-fold:

1) they serve as the canonical definition for the behavior of the processor. As the design work progresses, the design is constantly compared to these models to ensure that the design is functionally correct. (This can be done in at least two ways, which are topics for another day)​
2) they tell the logic designers what they are designing. The logic designers read the model to understand what each block they are designing is supposed to do.​
3) in some design flows, they are used as inputs to automated ”synthesis” tools that produce gate-level designs from the functional models. This is usually a bad idea for anything other than the least important logic.​
These models describe the behavior of the processor, not the structure of the processor. This can get somewhat confusing, because languages such as Verilog and VHDL can also be used to define the structure of a logic circuit, and for some times of products, including even some CPUs, designers may skip the purely-behavioral representation entirely. However, in my experience, for the most advanced and high performance processors, it is important to keep behavior and structure separate. Among other things, this simplifies the process of “verifying” the logical operation of the CPU.

Typically the architectural model is divided into different files and/or modules, each of which is intended to correspond, at least roughly, to a physical module. This is important because the module definition will declare each of the inputs and outputs (or, sometimes, “inouts”) that define the interface to the module. Because these decisions have physical consequences, it is important for the architect to work with the physical designers to understand what partitioning makes the most sense.

Once the architectural model is created and verified as functional, it may continue to change throughout the design process. Logic may have to be moved across module boundaries in order to enable timing requirements to be fulfilled, for example. Extra signals may need to be added to a module interface to take advantage of extra time available in one module or another. Or some other physical design problem may require a module to be redesigned in some way.

In a typical CPU core, top-level modules would typically include: instruction fetch, instruction decode, instruction scheduling, integer execution, floating point execution, the register file, caches and load/store. Each module will likely have submodules - for example, the integer execution unit may have one or more ALU submodules.

Computer architecture is a huge topic, encompassing everything from cache sizing and organization to branch prediction algorithms to how many pipeline stages should be used. Hennessy and Patterson is considered the bible in this field in case you want to learn more. A free electronic copy of the fifth edition of their quantitative architecture book is here: http://acs.pub.ro/~cpop/SMPA/Computer Architecture A Quantitative Approach (5th edition).pdf

A free pdf of their computer organization and logic design book is here: https://ict.iitk.ac.in/wp-content/u...mputerOrganizationAndDesign5thEdition2014.pdf

I particularly recommend the latter as a starting point for those interested in computer design as a whole.

Next time I will address what my employers’ always called “logic design,” but which many companies would probably call “physical design.”
About author
Cmaier
Cliff obtained his PhD in electrical engineering with concentrations in solid state physics and computer engineering from Rensselaer Polytechnic Institute. Cliff helped design some of the world’s fastest CPUs, including Exponential Technology‘s x704, Sun’s UltraSparc V, and many CPUs at AMD, including the original Opteron and Athlon 64.

Cliff’s CPU design experience ranges from instruction set architecture, including contributions to x86-64, to microarchitecture (especially memory hierarchy design), to logic and physical design (including ownership of floating and integer execution units, instruction schedulers, and caches). Cliff was also a member of AMD’s circuit design team, and was responsible for electronic design automation at AMD for a number of years in the Opteron era.

Cliff has designed both RISC and CISC microprocessors, using both GaAs and silicon, and helped design two different bipolar microprocessors before shifting to FET technology.

Comments

I appreciate the article, @Cmaier, this is a great primer for how you approached your design work. It was digestible enough for those of us without your level of expertise to understand the general approach to microarchitecture.

AMD64 (now called x86-64) is an example of a new ISA I had the privilege to work on.

Aren't you downplaying this, a bit? Correct me if I'm wrong, but if I recall correctly, didn't you write the draft for x86-64?
 
I appreciate the article, @Cmaier, this is a great primer for how you approached your design work. It was digestible enough for those of us without your level of expertise to understand the general approach to microarchitecture.



Aren't you downplaying this, a bit? Correct me if I'm wrong, but if I recall correctly, didn't you write the draft for x86-64?
Only for the integer instructions. I was assigned the original design of the integer execution unit and scheduler for sledgehammer. We didn’t have an architect - there were only about 15 of us left to work on the chip. I asked Fred Weber how I was supposed to design the thing without knowing what it was supposed to do, and he told me I should just go ahead and draft up the instructions. The tricky part was figuring out how long each instruction should take, and what our flags would look like.

Weirdly, the thing I was most proud of was the multiplier - I had it doing 64 bit multiplies faster than the Athlon multiplier did 32 bit multiplies.

At some point I also owned the floating point unit, but not for long.

Pretty quickly on that design my role changed quite a bit. From the start, I was one of the few people who dealt with global issues - floorplanning, power and clock grids, standard cell architecture, etc. Then I was put in charge of design automation and design methodology and I handed over the execution unit to others (because writing all our cad tools and deciding what to automate and how to tie it all together took too much of my time).

I can’t remember who, but it very well may have been one of the founders of Nuvia who replaced me on the integer unit and scheduler design.
 

Article information

Author
Cliff Maier, PhD
Article read time
4 min read
Views
718
Comments
5
Last update
Rating
5.00 star(s) 2 ratings

More in General Technology

More from Cliff Maier, PhD

Top Bottom
1 2