A18 Pro … your thoughts?

[However: keep in mind that if it costs $200million to tape out A18, it does NOT cost another $200million to tape out A18 Pro. A ton of the work on the one is applicable to the other.]
This. CPU cores, GPU cores, neural engine, and more look identical, as you'd expect - no need to do expensive manual design twice.

It looks like they use automated place & route for blocks that have lower performance requirements (I'm thinking media encode/decode engines and so on). Lots of that stuff looks quite different, so I suspect they let the tools re-do some APR to re-flow low performance blocks.
 
Annotated image from HighYield on Twitter.

View attachment 31648
A few things to note about this annotation:
  1. Each NPU block shown contains two cores, so the total is 16 cores, not 8
  2. There are four memory controllers, two on the top edge and two on the bottom, quite far from each other
  3. The A18 Pro die has more high speed I/O on the right edge than the A18, perhaps due to USB 2 vs USB 3 plus the extra camera sensor and lidar
 
Last edited:
It‘s (2). „Ft“ (sorry - my ipad has decided I am typing german for some reason) is the „toggle frequency“ of the transistor, which is the maximum speed with which it can toggle back and forth between on and off. This frequency increases as temperature decreases, largely due to decreases in the channel resistance. So exotic cooling allows you to increase the clock frequency (which reduces the cycle time) because the transistors in a given cycle need less time.
I finally remembered the term I was looking for - FO4. But I guess Ft is a more direct way of talking about the fundamental property I was asking about.
 
I finally remembered the term I was looking for - FO4. But I guess Ft is a more direct way of talking about the fundamental property I was asking about.

FO4 is not a property and doesn’t tell you about the latency or performance of anything. It’s a useful rule of thumb that informs a designer about how many logic gates should connect to the output of another logic gate.

If one inverter’s output “fans out” to connect to the inputs of four other inverters that are the same size as the first, that’s a “fan out” of 4. FO4 tells us that, assuming a bunch of things are true, that’s is the optimal number of connections in order to maximize performance. Similarly, one inverter’s output connected to the input of another inverter that is 4x the size of the first would be a fan out of 4.

“4” only works for inverters - you adjust the number based on “logical effort” of the logic gate (essentially how complicated it is).

And in the real world, parasitic capacitances and resistances on the wires mean you have to adjust further.

In any event, it’s a design principal that doesn’t change based on temperature, fab node, etc.
 
FO4 is not a property and doesn’t tell you about the latency or performance of anything. It’s a useful rule of thumb that informs a designer about how many logic gates should connect to the output of another logic gate.[...]
Huh. At the risk of getting us somewhat off-topic...

My minimal learning on this is many years old, but my vague recollection was that FO4 was used as a metric for the performance of a particular process. It measured the time for one inverter to drive four more, for that specific process. You could use that to normalize delay measurements for circuits implemented in two different processes so you could reasonably compare them. I figured that that would change similarly to the way Ft does. But you're saying it's invariant. Did I misunderstand (or misremember) what FO4 is?
 
Huh. At the risk of getting us somewhat off-topic...

My minimal learning on this is many years old, but my vague recollection was that FO4 was used as a metric for the performance of a particular process. It measured the time for one inverter to drive four more, for that specific process. You could use that to normalize delay measurements for circuits implemented in two different processes so you could reasonably compare them. I figured that that would change similarly to the way Ft does. But you're saying it's invariant. Did I misunderstand (or misremember) what FO4 is?
ah, ok. well, I guess you *could* use it that way (but I’d ask: “what inverter? how is it laid out? what is its physical aspect ratio? how big are the power and ground rails? How *big* is the inverter?) On a given process node, there are an infinite number of possible inverter designs. Even putting aside layout variations, there are sizes. I would guess in this scheme you propose, we would be talking about what I would call an INX1, driving an INX4 (or driving 2 INX2s, or driving 4 INX1s), where an INX1 is the minimum possible drive strength - in CMOS this would be W/L=1 for the NFET, and in FINFETs I guess that would be a single fin.

I’ve never heard of anyone using this as a metric - certainly we never did - precisely because there are a lot of questions there. You have to make a lot of assumptions about what this inverter is, and it’s not very useful for extrapolating to other more complicated gates (because, for example, NFETs and PFETs scale quite differently, and a given process may have backside power, or SOI, or some other feature that makes it so that comparing inverters on 2 processes tells you very little about comparing two multiplexers or two NAND gates.

What FO4 has always referred to, in my experience, is the rule of thumb that designers use when deciding how to size gates. In other words, if you need to drive 4 INX4’s, you probably want approximately 1 INX4 driving it (or 2 INX2s, or whatever).

I often say “I’m not sure” or “I’d have to think about it” re: questions here, but I can say the following additional thing with absolute authority, because I was the timing czar for a long time at AMD, wrote a lot of our timing tools, and worked with synopsys to show them several mistakes in the way their timing tools worked, which they then fixed. Even on a single chip, fabricated on a single node, if you have an INX1 driving an INX4, it will almost never have the same gate delay as an INX4 driving an INX16. In fact, that INX1 driving an INX4 will not have the same gate delay as another INX1 driving a different INX4. There are just too many variations in lots of parameters. So to make FO4 a metric instead of a rule of thumb, you’d have to specify all sorts of things: input slew rate, physical gate size, which drive strength are we using, assume zero wire capacitance and resistance (or specify what R and C are), temperature, etc. Small changes in these parameters can have huge effects.
 
ah, ok. well, I guess you *could* use it that way (but I’d ask: “what inverter? how is it laid out? what is its physical aspect ratio? how big are the power and ground rails? How *big* is the inverter?) On a given process node, there are an infinite number of possible inverter designs. Even putting aside layout variations, there are sizes. I would guess in this scheme you propose, we would be talking about what I would call an INX1, driving an INX4 (or driving 2 INX2s, or driving 4 INX1s), where an INX1 is the minimum possible drive strength - in CMOS this would be W/L=1 for the NFET, and in FINFETs I guess that would be a single fin.

I’ve never heard of anyone using this as a metric - certainly we never did - precisely because there are a lot of questions there. You have to make a lot of assumptions about what this inverter is, and it’s not very useful for extrapolating to other more complicated gates (because, for example, NFETs and PFETs scale quite differently, and a given process may have backside power, or SOI, or some other feature that makes it so that comparing inverters on 2 processes tells you very little about comparing two multiplexers or two NAND gates.

What FO4 has always referred to, in my experience, is the rule of thumb that designers use when deciding how to size gates. In other words, if you need to drive 4 INX4’s, you probably want approximately 1 INX4 driving it (or 2 INX2s, or whatever).

I often say “I’m not sure” or “I’d have to think about it” re: questions here, but I can say the following additional thing with absolute authority, because I was the timing czar for a long time at AMD, wrote a lot of our timing tools, and worked with synopsys to show them several mistakes in the way their timing tools worked, which they then fixed. Even on a single chip, fabricated on a single node, if you have an INX1 driving an INX4, it will almost never have the same gate delay as an INX4 driving an INX16. In fact, that INX1 driving an INX4 will not have the same gate delay as another INX1 driving a different INX4. There are just too many variations in lots of parameters. So to make FO4 a metric instead of a rule of thumb, you’d have to specify all sorts of things: input slew rate, physical gate size, which drive strength are we using, assume zero wire capacitance and resistance (or specify what R and C are), temperature, etc. Small changes in these parameters can have huge effects.
After reading this I had to figure out how my understanding of FO4 could be so off.

There's plenty of stuff on the web that talks about it in a way that's compatible with my former understanding. For example, https://www.realworldtech.com/fo4-metric/ - which might even conceivably be one of the original places I read about FO4 20 years ago. There's also https://vlsiarch.eecs.harvard.edu/publications/fanout-4-inverter-delay-metric and a bunch of other stuff that's pretty easy to find using google. Of course now that I look at it again it's all fairly old.

Do you think those sources are in error, or out of date?
 
After reading this I had to figure out how my understanding of FO4 could be so off.

There's plenty of stuff on the web that talks about it in a way that's compatible with my former understanding. For example, https://www.realworldtech.com/fo4-metric/ - which might even conceivably be one of the original places I read about FO4 20 years ago. There's also https://vlsiarch.eecs.harvard.edu/publications/fanout-4-inverter-delay-metric and a bunch of other stuff that's pretty easy to find using google. Of course now that I look at it again it's all fairly old.

Do you think those sources are in error, or out of date?
So it looks to me like someone at stanford was proposing that metric.

As I said, never heard of anyone actually treating FO4 as a metric, and we certainly never did.

We characterized a process by the Spice model that defined the current/voltage behavior of the transistors, then designed our inverters (and everything else) based on that. If we had to talk about a single figure-of-merit, it was Ft. The proposed metric wouldn’t be of any use to us for the reasons previously stated.
 
Back
Top