Just curious, were you guys able to bring it back up to 533? And if so, do you remember what the power penalty was like?
Speaking of power. I remember that at the time, everyone expected Exponential's chips to be hotter than the surface of the sun, roughly speaking. It amuses me that by today's standards, it probably wasn't very hot at all. Not saying it was cool, but back then we thought 50W in a single chip was really hot, and these days 50W just isn't all that spicy.
We taped out the fix, and I believe we got a rocket lot back that hit target speed, but it’s been so long that I cannot recall those details anymore. (We went out of business before I’d been there a year, thanks to Steve Jobs).
As for power density, my recollection is that it was 100W total, but that could be wrong. Definitely air cooled - at the time, as I wrote that more than once in the JSSC article. 100W sticks out in my mind because back then an average-sized processor had to be less than 100W for air cooling. of course this thing was allegedly “bicmos” but not in the way people used that term typically. The logic circuits were all ECL or CML, and the on-chip memory was all CMOS (at least for the memory cells). No CMOS in the logic circuits, or in the latches. Four-phase clock, which used latches instead of flip-flops - we were living dangerously.
Since I was talking about sizing, an example from the paper is below. Since these were ECL/CML gates, you had “core” transistors (for doing the actual boolean logic switching), and then often stuck an “emitter follower” on the output in order to (1) shift the voltage to the correct level and (2) add drive strength to the cell. This was sort of like modern FINFETs. In CMOS you could draw an arbitrary-sized-and-shaped polygon of “active” area, then draw an arbitrary polysilicon polygon above it for the gate, and you made yourself an arbitrarily-sized transistor. With FINFETs your fins are pre-sized and shaped. That’s how our bipolar design worked - you had pre-drawn transistors of different sizes, and you’d use what you needed.
One interesting thing is we did our design in C, but in a very special way. So if I said something like:
A = b || c && d; // 200,800
and if there was no standard cell that did this function, it would automatically create a standard cell on-the-fly, complete with layout, and the specified drive strengths. (We actually also put the LOCATION of the cells in the C comments, too, which was a mistake that I remedied at AMD. The problem was if I wanted to move a cell 2 microns to the left, this “touched” the source file and all sorts of steps had to be re-run. So at AMD I created separate files for placement. And AMD was already using verflog instead of C, which made certain things simpler, though made simulations much slower. If a cell was used often enough, or was particularly important, we had a lady who would hand optimize the physical design and replace the automatically-generated one.
Each statement in the C file had to correspond to a gate - this was not a logic synthesis situation. You were very precisely specifying the exact circuit based on the order of the arguments on the line (which determined where in the circuit tree the inputs connected), etc.
We used CML in the data paths - this is similar to ECL, but every signal is differential. Example, albeit with a non-differential B0 for some reason. Wonder if that is an error in the paper. No reason to use a reference voltage for the b input. “data paths” was anything that was sort of “regular” like adders, comparators, shifters, etc. These standard cells had constant width but variable height, which was interesting in retrospect (data paths ran vertically, so each “bit” was a fixed-width column). “random logic” mostly used ECL, and was fixed height, variable width (like most if not all standard cells designs today).