Thread: iPhone 15 / Apple Watch 9 Event

I was thinking of tables showing the smaller process is expected to give reduced power consumption for the same performance, everything else being equal. I thought this meant that, if you took the A16 architecture, and were able to do an exact port from N4 to N3, it should show lower power consumption at the same frequency. That would be akin to what Intel did with a "Tick": Same architecture, newer process.

In this case, though, you have the confounding factor that the architecture isn't the same. So it seems the only way you'd get the same (rather than lower) power consumption at the same frequency on a smaller process would be if, for the same frequency, the new architecture itself caused an increase in power consumption.

And if so, is it typical that newer (i.e., more advanced) architectures draw more power for the same frequency? [For instance, when Intel did a "tock" (putting a newer architecture onto the existing process).]

Well it all gets pretty complicated. If you just scale an existing design (which is impossible nowadays), yeah, you would likely see reduced power consumption (but…)

The reason it’s impossible nowadays is because all dimensions don’t scale the same from node to node. Spacing may scale a different amount than minimum width, and metal may scale differently than transistors. And the wire heights seldom scale anywhere close to their widths. And voltage seldom is scaled down anymore. I’ve said it before and I’ll say it again - anytime TSMC or anyone says “this process is x % faster at the same power” (or vice versa), I respond with “thanks for your unhelpful information. Now give me the design rules and the layer stack and I will figure out what effect that has on my design.”

If your transistors get 33% smaller but your wires get 20% less capacitive, then your transistors are effectively weaker, and you need to size them up to drive the load, or you need to spread the wires to reduce capacitance, but that increases some wire lengths which increases capacitance. And if you make your wires shorter and thinner to reduce capacitance, you increase their resistance. But if they get shorter, than resistance doesn’t need to increase. And you can make them shorter if the transistors are smaller and don’t need to be so far apart. But if spacing scales differently than widths, or if metal scales differently than transistors, than the whole virtuous cycle doesn’t work right.

The “but…” is that the equation and analysis I referred to is for dynamic power. Static power increases as you scale down, because you can’t generate enough electric field across the channel region to completely shut off the transistors. That is attacked by new transistor architectures (MOSFET->FINFET->GAAFET, etc.). N2 is where you’ll see that huge improvement for TSMC, I believe.

The real advantage to smaller transistors is you can fit a lot more of them in the same space.
 
Well it all gets pretty complicated. If you just scale an existing design (which is impossible nowadays), yeah, you would likely see reduced power consumption (but…)

The reason it’s impossible nowadays is because all dimensions don’t scale the same from node to node. Spacing may scale a different amount than minimum width, and metal may scale differently than transistors. And the wire heights seldom scale anywhere close to their widths. And voltage seldom is scaled down anymore. I’ve said it before and I’ll say it again - anytime TSMC or anyone says “this process is x % faster at the same power” (or vice versa), I respond with “thanks for your unhelpful information. Now give me the design rules and the layer stack and I will figure out what effect that has on my design.”

If your transistors get 33% smaller but your wires get 20% less capacitive, then your transistors are effectively weaker, and you need to size them up to drive the load, or you need to spread the wires to reduce capacitance, but that increases some wire lengths which increases capacitance. And if you make your wires shorter and thinner to reduce capacitance, you increase their resistance. But if they get shorter, than resistance doesn’t need to increase. And you can make them shorter if the transistors are smaller and don’t need to be so far apart. But if spacing scales differently than widths, or if metal scales differently than transistors, than the whole virtuous cycle doesn’t work right.

The “but…” is that the equation and analysis I referred to is for dynamic power. Static power increases as you scale down, because you can’t generate enough electric field across the channel region to completely shut off the transistors. That is attacked by new transistor architectures (MOSFET->FINFET->GAAFET, etc.). N2 is where you’ll see that huge improvement for TSMC, I believe.

The real advantage to smaller transistors is you can fit a lot more of them in the same space.
It should also be pointed out that even in TSMC’s own promotional material that N3B isn’t that much better than N4P in terms of perf/W. The 10-15% and 20-25% is versus N5 not N4Pwhich boasted 10/20% vs N5. As @Cmaier pointed out the primary gain is in transistor density (42% vs 6%). So you can fit more transistors in the same area.

I know @Cmaier doesnt like these top line numbers but they do back up what he’s saying here. N3B shrinks transistor density a ton, but doesn’t necessarily mean you’re getting lots of free performance boost/power savings. It comes with trade offs.

1695162897528.png

1695162922481.png

Bottom line: some of us, including myself just to be clear, were wondering where the 3nm benefits went, but looking back at these numbers and some of these results make a lot of sense. Still intrigued by the GPU data and some of the power measurements. Already liked two posts wishing we had Andrei back.
 
Last edited:
Indeed. They are wild! One minute it’s terrible, the next it’s brilliant. I do find the geekerwan results curious though.
Indeed. "Massive thermal throttling" FFS it was not that long ago when some people saw their MacBooks dropping from a base clock above 3GHz to ~800MHz due to throttling.

I do find some of the results curious as well. Too bad there will be no deep dive review from Andrei Frumusanu this year.
 
So till 2026. Enough time for Intel to catch up. Hopefully they do we need more fab competition
A good fab can’t save Intel unless they also ditch CISC. But I’m sure they’d take apple’s money to fab Arm chips if Apple wanted.
 
Indeed. "Massive thermal throttling" FFS it was not that long ago when some people saw their MacBooks dropping from a base clock above 3GHz to ~800MHz due to throttling.

I do find some of the results curious as well. Too bad there will be no deep dive review from Andrei Frumusanu this year.
To be fair some of the thermal results appear toasty if accurate. But yeah.
 
To be fair some of the thermal results appear toasty if accurate. But yeah.
My biggest frustration is not know what is accurate. The rush to get the reviews out isn’t very helpful. I originally thought geekerwan didn’t have a real review unit. It seems they did.
 
Vulkan is not more game oriented than Metal, each API has strengths and weaknesses but using Vulkan does not intrinsically net you any more performance than Metal.

Very suspicious of these results in general.
A common refrain of mine that a given graphics or compute API might not be better or worse than another overall but how well a specific graphics/compute engine implements the API may be different (and drivers for those APIs). Generally gfxbench is a decent benchmark that corresponds roughly to what one would expect. But you can definitely see weirdness in say GB compute OpenCL where Apple GPUs beat Nvidia GPUs they have no right to be anywhere near.
 
Last edited:
From what I understand the Snapdragon GPU is very specialized compared to apple's GPUs. Where Apple has taken its GPUs more in the direction of general purpose compute Qualcomm continues to manly focus on optimizing for games. This, IIRC is reflected in the compute scores of, say the M2 vs the snapdragon. The M2 gets 26,000 on OpenCL while the snapdragon only gets 8,000. Compared to the M2 the A17 offers about 61% of the GPU performance so rough napkin math means the A17 would score around 16,000 on Open CL (if such a benchmark existed).
That might reflect more weirdness in the OpenCL test than some ground truth between the two GPUs. As I mentioned above some of the tests heavily depend on how well a particular API is implemented by the driver rather than what the hardware is capable of. You could argue the user doesn’t care if they depend on that API, but one cannot generalized conclusions about the capabilities of the two GPUs including for compute.

OpenCL is basically a dead api, including on the Mac, and it has a bunch of weird results that make Apple look really good and really bad depending on the test that is being run. Like I remember when the M1 first came out some trolls tried to use some OpenCL results to prove how bad the new GPU was relative to AMD, but those weren’t really representative of anything else.*

The Adreno 740 in the Snapdragon 8 gen 2 does boast some impressive stats, so maybe it is that good? I’m still suspicious of these power draw figures between devices but it’s not as out of bounds as when I first looked at them.

@Jimmyjames you said you’d seen other tests with the 740 being not as good as the Apple GPU?

*Edit: having looked again at the OpenCL GB results they look a lot more rational that I remembered for most devices. However it could still be a case of an issue with the OpenCL driver in the 740 being subpar rather than any hardware limitation with respect to compute.
 
Last edited:
Well it all gets pretty complicated. If you just scale an existing design (which is impossible nowadays), yeah, you would likely see reduced power consumption (but…)

The reason it’s impossible nowadays is because all dimensions don’t scale the same from node to node. Spacing may scale a different amount than minimum width, and metal may scale differently than transistors. And the wire heights seldom scale anywhere close to their widths. And voltage seldom is scaled down anymore. I’ve said it before and I’ll say it again - anytime TSMC or anyone says “this process is x % faster at the same power” (or vice versa), I respond with “thanks for your unhelpful information. Now give me the design rules and the layer stack and I will figure out what effect that has on my design.”

If your transistors get 33% smaller but your wires get 20% less capacitive, then your transistors are effectively weaker, and you need to size them up to drive the load, or you need to spread the wires to reduce capacitance, but that increases some wire lengths which increases capacitance. And if you make your wires shorter and thinner to reduce capacitance, you increase their resistance. But if they get shorter, than resistance doesn’t need to increase. And you can make them shorter if the transistors are smaller and don’t need to be so far apart. But if spacing scales differently than widths, or if metal scales differently than transistors, than the whole virtuous cycle doesn’t work right.

The “but…” is that the equation and analysis I referred to is for dynamic power. Static power increases as you scale down, because you can’t generate enough electric field across the channel region to completely shut off the transistors. That is attacked by new transistor architectures (MOSFET->FINFET->GAAFET, etc.). N2 is where you’ll see that huge improvement for TSMC, I believe.

The real advantage to smaller transistors is you can fit a lot more of them in the same space.
Is static power what it consumes when idling, making dynamic power the main portion of the power consumed when working? If so, while I get that you're saying there's a very complex relationship between node dimensions, architecture and power draw, wouldn't it be the case that most architectures are designed to take maximal advantage of the process on which they'll be used? And given that, can't you at least make the following coarse-grained statement?:

"If you have two similar architectures (i.e., from the same family, and designed for the same usage—e.g., A16 and A17), and each was optimally designed for the process on which it was used, then, as a general rule, you can typically achieve lower power consumption at the same clock speed on the smaller process."

If that's not the case, then the statements we typically see about a newer process offering '≈X% more performance at the same energy consumption, or ≈Y% lower energy consumption at the same performance' have no substance whatsoever. And that in turn further raises the question of why even supposedly sophisticated sites like anandtech.com have been posting these without the caveat that they have no real meaning.

If, OTOH, that is the general expectation, then it makes sense why it would be a bit surprising that A16/N4 and A17/N3 have the same power draw at the same frequency (though, as I said earlier, the data he presented [at least in that slide] was to only one significant figure, so it really isn't telling us whether they are drawing the same power or not [relative to the expected differences of low 10's of percent]).
 
Last edited:
Is static power what it consumes when idling, making dynamic power the main portion of the power consumed when working?

Well, depends what you mean by “when working.” In any given cone of logic, there will always be some gates that switch (because their inputs switched and the logic function results in the output having to change) and some that don’t. Even within a single logic gate, typically some transistors (in a given cycle) switch and others don’t. The advantage of CMOS over its predecessor NMOS is that, in theory, in any given cycle a lot of logic gates consume no current, because no transistors switch (and hence no charge is moved). But that was back when transistors were huge. Now, even transistors that aren’t changing their values will pass a fairly large amount of ”static” current because the gate cannot completely shut off. It’s like a drippy faucet. Back in the late 90’s, static power on high end CPUs was approaching maybe 10% of total power. That steadily climbed, though things like silicon-on-insulator, and then FINFETs, helped prevent disaster.


"If you have two similar architectures (i.e., from the same family, and designed for the same usage—e.g., A16 and A17), and each is optimally designed for the process on which it will be used, then, as a general rule, you can typically achieve lower power consumption at the same clock speed on the smaller process."

Well, you get what you optimize for. You can always get lower power consumption at the same clock speed on a smaller process. Just give up IPC. Get rid of transistors. Your performance will suck, but you will lower power consumption at the same clock speed. I did a lot of “process shrinks.” We always managed to improve performance. But sometimes you do that by lowering clock speed and increasing IPC while keeping power constant. Sometimes you do it by maintaining clock speed and IPC but lowering voltage. And once in awhile you can do it by raising clock speed, maintaining IPC, and keeping power the same or lower. Lots of possibilities. But it was always very very dependent on the details of the process shrink - it doesn’t matter if transistors get smaller if the fab has to stick an extra nitride layer in that screws everything up. Or if design-for-manufacturing means you need to spread the (smaller) transistors apart in order to get yield up.


If that's not the case, then the statements we typically see about a newer process offering '≈X% more performance at the same energy consumption, or ≈Y% lower energy consumption at the same performance' have no substance whatsoever. And that in turn further raises the question of why even supposedly sophisticated sites like anandtech.com have been posting these without the caveat that they have no real meaning.

To me, as a CPU designer who designed CPUs using custom circuits, they’re almost meaningless. To someone designing simple ASICs using off-the-shelf libraries, maybe it does have meaning. But never - not once - when we were planning new fab processes (at AMD) or choosing processes (other places I worked) did we ever ask for, receive, or even run across those sorts of numbers. We were entirely focussed on details of the process. For each layer (poly, M0, M1, M2, etc.) tell me its thickness, minimum width, minimum spacing, etc. And tell me the layer stack -how thick are the interlayer dielectrics. What is their dielectric constant. Etc. That;’s what we wanted to know. That’s what we plugged into our planning - we would often “build”the old chip using the new technology (hacking it) to see what, approximately, it would do for power and timing.
 
Is static power what it consumes when idling, making dynamic power the main portion of the power consumed when working? If so, while I get that you're saying there's a very complex relationship between node dimensions, architecture and power draw, wouldn't it be the case that most architectures are designed to take maximal advantage of the process on which they'll be used? And given that, can't you at least make the following coarse-grained statement?:

"If you have two similar architectures (i.e., from the same family, and designed for the same usage—e.g., A16 and A17), and each was optimally designed for the process on which it was used, then, as a general rule, you can typically achieve lower power consumption at the same clock speed on the smaller process."

If that's not the case, then the statements we typically see about a newer process offering '≈X% more performance at the same energy consumption, or ≈Y% lower energy consumption at the same performance' have no substance whatsoever. And that in turn further raises the question of why even supposedly sophisticated sites like anandtech.com have been posting these without the caveat that they have no real meaning.

If, OTOH, that is the general expectation, then it makes sense why it would be a bit surprising that A16/N4 and A17/N3 have the same power draw at the same frequency (though, as I said earlier, the data he presented [at least in that slide] was to only one significant figure, so it really isn't telling us whether they are drawing the same power or not [relative to the expected differences of low 10's of percent]).
Just taking the stated numbers at face value the difference in power draw could be as low as a couple of percent even according to TSMC. We’re talking a range of ~2.5-6.6% change.
 
You can always get lower power consumption at the same clock speed on a smaller process. Just give up IPC. Get rid of transistors. Your performance will suck, but you will lower power consumption at the same clock speed.
Sure, but I thought it would be understood that's not what I meant by "optimally designed....for the same usage".

In looking at your examples, in every case it seems you are saying you managed to leverage the smaller process to improve performance/watt. Yes, you're saying that how you managed to achieve that varied enormously, but even with that variation there seems to be an underlying consistency in result (though I acknowledge the specific example of improved IPC wouldn't be directly due to the smaller process—or to the extent the smaller process helped there, it would simply be by enabling you to add more circuitry ):
I did a lot of “process shrinks.” We always managed to improve performance. But sometimes you do that by lowering clock speed and increasing IPC while keeping power constant. Sometimes you do it by maintaining clock speed and IPC but lowering voltage. And once in awhile you can do it by raising clock speed, maintaining IPC, and keeping power the same or lower. Lots of possibilities. But it was always very very dependent on the details of the process shrink - it doesn’t matter if transistors get smaller if the fab has to stick an extra nitride layer in that screws everything up. Or if design-for-manufacturing means you need to spread the (smaller) transistors apart in order to get yield up.

So, rephrasing, can't one make this as a general coarse-grained statement:

"If you're desiging for the same use case on two different processes then, as a general rule, you can typically achieve a higher performance/watt value on the smaller process without giving up performance."

After all, that is what you did here:
We always managed to improve performance. But sometimes you do that by lowering clock speed and increasing IPC while keeping power constant
And also what you did here:
And once in awhile you can do it by raising clock speed, maintaining IPC, and keeping power the same or lower.

I know I'm pressing you on this, but the reason I am is that it seems that improved power/performance with smaller processes is, historically, the general trend.

I understand you have good reason to not be comfortable assertions of the form '≈X% more performance at the same energy consumption, or ≈Y% lower energy consumption at the same performance', but I don't care about the quantitative amounts which, I understand, are indeed meaningless. I'm trying to see if one can make a qualitative statement about what designers are generally able to do with smaller processes, based purely on the physics of the process being smaller (i.e., not simply because of the increased circuitry one can add), and for the same qualitative type of process technology.
 
Last edited:
"If you're desiging for the same use case on two different processes then, as a general rule, you can typically achieve a higher performance/watt value on the smaller process without giving up performance."

That is usually the case, but it is not always the case. So, a general rule, I suppose. It would come closer to a rule if you are talking about two processes from the same fab (as we are here). But a given 3nm process could be worse than a given 5nm process if the leakage currents in the 3nm process are much higher than in the 5nm process.


I know I'm pressing you on this, but the reason I am is that it seems that improved power/performance with smaller processes is, historically, the general trend.
I agree that’s the general trend. But it’s punctuated. Things stop improving for a time (one of the reasons Intel got stuck on +++ processes for a long time is because the next process would have been worse!), and then you change something fundamental and progress moves along again. In the past, we had things like copper wire, low-K dielectric, silicon-on-insulator, and FINFET to reset things for the next performance/watt ramp-up. Gate-all-around will be the next thing. Because we are now at transistor sizes where leakage is a very real problem, and making transistors smaller makes that problem worse.

I understand you have good reason to not be comfortable assertions of the form '≈X% more performance at the same energy consumption, or ≈Y% lower energy consumption at the same performance', but I don't care about the quantitative amounts which, I understand, are indeed meaningless. I'm trying to see if one can make a qualitative statement about what designers are generally able to do with smaller processes, based purely on the physics of the process being smaller (i.e., not simply because of the increased circuitry one can add), and for the same qualitative type of process technology.

Well, I would agree with the statement that “smaller processes are generally better.” After all, just because a process is smaller doesn’t mean i have to make my transistors smaller.

BUT, the hidden question there is “what does it mean to be a smaller process?” If the minimum gate length goes down, we call that a “smaller process,” generally. But if other things don’t scale along with it - if metal width doesn’t shrink, or there is a polygon minimum area rule that doesn’t shrink as the square of the linear gate shrink, or if the layer thicknesses don’t decrease, etc. then what we have is not really a “smaller process” of the sort I was referring to in the immediately preceding paragraph. In the old days, when we shrunk the process, we shrunk the process. Pretty much everything scaled. This hasn’t been the case in awhile. And because of that, for the last 5-10 years (depending on which fabs you are talking about) we are running into situations where processes are better in some ways and worse in others, and whether you can leverage the changes to improve performance/watt may depend on what you’re willing to give up to do that.
 
So, I have quickly skimmed though the Geekerwan video. Unfortunately, I don't speak Chinese, so a lot of details might have been lost on me. Few preliminary take away points for me:

- The SPEC2017 scores are insane, essentially on par with 13900K
- Higher power consumption for peak performance, very similar perf/watt as A14/A15
- Very good system power efficiency for demanding sustained workloads like gaming

My impression is that the new u-arch appears to be more aggressively performance-focused than the 5nm products. Instead of trying to improve CPU performance at the same power draw, they go for maintaining perf/watt and opening up the upper range. We could be seeing a desktop-focused design here. I am less worried about the increase in power consumption at peak, as the phone appears to manage the total energy levels just fine and overall system performance looks much better with comparable or lower power draw in sustained scenario (e.g. the gaming benchmark).

At this point it's probably safe to conclude that the dramatic IPC increase as many of us have hoped didn't happen. One could spin a number of stories from here, including a pessimistic take that Apple is losing their edge etc. For now, I prefer to adopt a more pragmatic story, where Apple is tweaking their power consumption levels in order to better utilise the thermal capabilities of the desktop platforms.
 
That might reflect more weirdness in the OpenCL test than some ground truth between the two GPUs. As I mentioned above some of the tests heavily depend on how well a particular API is implemented by the driver rather than what the hardware is capable of. You could argue the user doesn’t care if they depend on that API, but one cannot generalized conclusions about the capabilities of the two GPUs including for compute.

OpenCL is basically a dead api, including on the Mac, and it has a bunch of weird results that make Apple look really good and really bad depending on the test that is being run. Like I remember when the M1 first came out some trolls tried to use some OpenCL results to prove how bad the new GPU was relative to AMD, but those weren’t really representative of anything else.

The Adreno 740 in the Snapdragon 8 gen 2 does boast some impressive stats, so maybe it is that good? I’m still suspicious of these power draw figures between devices but it’s not as out of bounds as when I first looked at them.

@Jimmyjames you said you’d seen other tests with the 740 being not as good as the Apple GPU?
IIRC gfxbench scores show the 740 behind Apple as well as geekbench. I haven’t seen any 3dmark tests for the a17. I forget it geekwerwan did those.
 
Back
Top