Apple only “major” device maker on 3nm in 2023?

When they say “N3 is 15% performance increase at the same power” what is that even supposed to mean, though? On N3 I redesign my circuit completely, to take advantage of the N3 design rules. Are they taking that into account? And, if so, what kind of circuit are they talking about?

Until I entered the outside world, I never heard of such things when comparing nodes. When I was designing CPUs, the only metrics we cared about were “x% reduction in minimum spacing on poly, y% on M1…, x% pitch reduction on layer __, wire heights decrease by z…”. We determined how much faster the next CPU would be, not the fab. You can’t predict these things from just one or two data points.

Absolutely that’s fair, from what I can gather it’s the expected outcome of doing a basic redesign of a given circuit that often appears to be a good rule of thumb for most chip designers.
 
Sure you can get that … but that is also unlikely to happen (even for Mac level power/cooling). Who knows? maybe Apple will pull a rabbit out of their hat, but that bunny would be one of these:


Okay maybe a Mac Pro but even then

N3 is 15% performance increase at the same power, to make a 60% overall ST improvement reasonable with a reasonable “oh darn it’s too hot for an iPhone” power draw, the IPC gains would have to be huge. Possible? Yes. Likely? No.

I mean I am hoping for and expecting big gains - Apple is probably going to cramming quite a few ideas they’ve been working on while N3 was delayed, but that would still be amazing and way outside Apple’s performance per generation cadence which has been pretty steady. I’m expecting it to be on the high end given the circumstances, but that’s way beyond even my bullish expectations.
There's a current rumor that Apple is considering upping the max clock on a forthcoming M2 Ultra Mac Pro to 4.2 GHz. Based on the 2741 GB6 SC score for the 3.66 GHz M2 Max listed on Primate's site, that would give 2741*4.2/3.66 = 3145. To get to 3986 on an M3 at the same clock, you'd only need an additional 3986/3145 => 27% improvement from process + architecture, and the latter seems well within the realm of possibility.

Of course they're not going to clock a production A-series that high—if the 3986 score is legit, they were just testing the A17 cores for clock scaling. But they could certainly put chips with those clocks in a Mac Pro—as well as their smaller Macs, if they limit those speeds to at most one or two cores (which would be a great approach, since most programs are single-threaded anyways) (essentially, they'd be adopting Intel's "turbo" model).
 
There's a current rumor that Apple is considering upping the max clock on a forthcoming M2 Ultra Mac Pro to 4.2 GHz. Based on the 2741 GB6 SC score for the 3.66 GHz M2 Max listed on Primate's site, that would give 2741*4.2/3.66 = 3145. To get to 3986 on an M3 at the same clock, you'd only need an additional 3986/3145 => 27% improvement from process + architecture, and the latter seems well within the realm of possibility.

Of course they're not going to clock a production A-series that high—if the 3986 score is legit, they were just testing the A17 cores for clock scaling. But they could certainly put chips with those clocks in a Mac Pro—as well as their smaller Macs, if they limit those speeds to at most one or two cores (which would be a great approach, since most programs are single-threaded anyways) (essentially, they'd be adopting Intel's "turbo" model).
That would run single core power consumption through the roof. For the Mac Pro? Sure I guess I could buy that. Studio maybe. For the smaller Macs unlikely, especially anything on a battery, and it wouldn’t just be “too hot for an iPhone” - that’s like saying the sun’s nuclear core gets a tad warm. That’s my point.
 
Last edited:
But does more φ always = better performance? Apple runs their SoCs comparatively slowly, which yields better perf/W because it can maintain an even pace with memory, et al. You ramp up too much and the processor core ends up in frequent idle cycles waiting to be fed. Caches do help, and Ms have really really big caches, but Apple is being not conservative but sensible with the clock. And their architecture shows competitive scores at lower speeds, so why dick arund with "turbo"?
 
That would run single core power consumption through the roof. For the Mac Pro? Sure I guess I could buy that. Studio maybe. For the smaller Macs unlikely, especially anything on a battery, and it wouldn’t just be “too hot for an iPhone” - that’s like saying the sun’s nuclear core gets a tad warm. That’s my point.
Well, that's pretty hyperbolic ;). Let's try running some numbers through a Fermi calculation:

Suppose that the M3 P-cores consume 5 WPC (watts per core) when running at 3.66 GHz (the current max clock for the M2). Even if we generously (to get a rough upper bound) estimate that the power consumption goes as (clock speed)^4, for 4.2 GHz that's only 5 WPC x (4.2/3.66)^4 = 8.7 WPC => an extra 3.7 WPC.

By 'smaller Macs than the Mac Pro', I had in mind the Studio, Mini, and 16" MBP (not the Air (!), or even the 14" MBP). That's a pretty decent proportion of the line.

If we allow this "turbo boost" for two cores in the 16" MBP and Mini, that's an extra 7.4W. [You could have this as a high power mode that you could automatically turn off when you're on battery, as is the option for the 16" MBP's current high power mode.] Maybe for the Studio we could allow four cores, for an extra 15 W.

Feel free to do this with your own starting WPC value and chosen scaling exponent.

And you could also limit it to when the GPU isn't being heavily used, such that the max TDP of the device is actually no more (or not much more) than it would be without this mode. I can think of many use cases in which I could benefit from a much higher CPU SC speed but am not stressing the GPU (like when using Mathematica -- or even for office work, like this single-threaded Adobe Acrobat task I posted here, which currently takes an M1 ≈ 35–45 s to complete: https://techboards.net/threads/request-for-adobe-acrobat-pro-benchmarking.3965/).
 
Last edited:
But does more φ always = better performance? Apple runs their SoCs comparatively slowly, which yields better perf/W because it can maintain an even pace with memory, et al. You ramp up too much and the processor core ends up in frequent idle cycles waiting to be fed. Caches do help, and Ms have really really big caches, but Apple is being not conservative but sensible with the clock. And their architecture shows competitive scores at lower speeds, so why dick arund with "turbo"?
cmaier made a prediction we'd be seeing AS clock speeds around 4 GHz, which presumably indicates he, at least, thinks Apple can address your concerns:
Further to the above, i am predicting a clock speed increase. Around 4GHz on the performance cores.
 
cmaier made a prediction we'd be seeing AS clock speeds around 4 GHz, which presumably indicates he, at least, thinks Apple can address your concerns:
Well, if they are going to 4GHz, the L3 is going to be bigger than all the memory I had in my 7200.
 
Well, if they are going to 4GHz, the L3 is going to be bigger than all the memory I had in my 7200.

It’s certainly going past the knee of the efficiency curve, but if you are a user who can benefit from X% more performance even if it costs you 2X% more power, the Mac Pro is the box where Apple would be willing to address that need.

As for memory bandwidth, I just don’t know what Apple’s cache hit rates look like or what typical MacOS instruction traces look like, so I have no idea if they’d end up stalling a lot. I would assume that the vast majority of memory reads happen from the cache, and that the caches are write-back so that they don’t saturate memory on writes very often, but I have no idea.
 
Well, that's pretty hyperbolic ;). Let's try running some numbers through a Fermi calculation:

Suppose that the M3 P-cores consume 5 WPC (watts per core) when running at 3.66 GHz (the current max clock for the M2). Even if we generously (to get a rough upper bound) estimate that the power consumption goes as (clock speed)^4, for 4.2 GHz that's only 5 WPC x (4.2/3.66)^4 = 8.7 WPC => an extra 3.7 WPC.

By 'smaller Macs than the Mac Pro', I had in mind the Studio, Mini, and 16" MBP (not the Air (!), or even the 14" MBP). That's a pretty decent proportion of the line.

If we allow this "turbo boost" for two cores in the 16" MBP and Mini, that's an extra 7.4W. [You could have this as a high power mode that you could automatically turn off when you're on battery, as is the option for the 16" MBP's current high power mode.] Maybe for the Studio we could allow four cores, for an extra 15 W.

Feel free to do this with your own starting WPC value and chosen scaling exponent.

And you could also limit it to when the GPU isn't being heavily used, such that the max TDP of the device is actually no more (or not much more) than it would be without this mode. I can think of many use cases in which I coud benefit from a much higher CPU SC speed but am not stressing the GPU (like when using Mathematica -- or even for office work, like this single-threaded Adobe Acrobat task I posted here, which currently takes an M1 ≈ 35–45 s to complete: https://techboards.net/threads/request-for-adobe-acrobat-pro-benchmarking.3965/).
Wait a minute though 3986 is supposedly for the A17 SC score which represented a 60% improvement. You quoted a starting point of a 45% improvement from 2741 to 3986. 45% is indeed much more believable if clocks are increased too.
 
For what it might be worth

max clock vs process
A71.4GHz28n(Samsung)
A81.5GHz20n(TSMC)
A91.85GHz16n(TSMC) / 14n(Samsung)
A10 Fusion2.34GHzN16FFC (TSMC)
A11 Bionic2.38GHzN10(TSMC)
A12 Bionic2.49GHzN7
A132.65GHzN7P
A143.1GHzN5
A153.23GHzN5P
A163.46GHzN4P

N3 is a significant change in the process, with its "FINFlex" layout.
 
For what it might be worth

max clock vs process
A71.4GHz28n(Samsung)
A81.5GHz20n(TSMC)
A91.85GHz16n(TSMC) / 14n(Samsung)
A10 Fusion2.34GHzN16FFC (TSMC)
A11 Bionic2.38GHzN10(TSMC)
A12 Bionic2.49GHzN7
A132.65GHzN7P
A143.1GHzN5
A153.23GHzN5P
A163.46GHzN4P

N3 is a significant change in the process, with its "FINFlex" layout.

Indeed. In addition to the normal power reduction that comes from shrinking the node, the change in transistor architecture means static power leakage will go down substantially. That frees up Apple to use that power budget in other ways.
 
Wait a minute though 3986 is supposedly for the A17 SC score which represented a 60% improvement. You quoted a starting point of a 45% improvement from 2741 to 3986. 45% is indeed much more believable if clocks are increased too.
Ah, sorry, I'm used to thinking in Mac terms, so I was considering what it would take to get 3986 on an M3 Mac (plus if it's going to be anywhere, that's where you'd find it, not on an iPhone, as you'd agree).

But if we instead do calculation using the iPhone 14 Pro as the baseline, we find it doesn't change things that much: Primate lists it with a GB6 score of 2504 @ 3.46 GHz. In that case, if we assume they were doing the scaling experiment to the same 4.2 GHz clock, then we have:

2504*4.2/3.46 = 3040, which means we'd need a 3986/3040 => 31% increase from IPC. That's not much different from the 27% I got when scaling from the 3.66 GHz M2 in the 16" MBP.

EDIT: Upon further consideration, an ~30% generational increase in IPC may be unlikely. But that just means that, if this result is real, Apple was experimenting with even higher clocks than 4.2 GHz.
 
Last edited:
Ah, sorry, I'm used to thinking in Mac terms, so I was considering what it would take to get 3986 on an M3 Mac (plus if it's going to be anywhere, that's where you'd find it, not on an iPhone, as you'd agree).

But if we instead do calculation using the iPhone 14 Pro as the baseline, we find it doesn't change things that much: Primate lists it with a GB6 score of 2504 @ 3.46 GHz. In that case, if we assume they were doing the scaling experiment to the same 4.2 GHz clock, then we have:

2504*4.2/3.46 = 3040, which means we'd need a 3986/3040 => 31% increase from IPC. That's not much different from the 27% I got when scaling from the 3.66 GHz M2 in the 16" MBP.

EDIT: Upon further consideration, an ~30% generational increase in IPC may be unlikely. But that just means that, if this result is real, Apple was experimenting with even higher clocks than 4.2 GHz.

To be fair it’s 31% IPC + node which is doable but now that is double the SC power using your ^4 equation. I dunno
 
To be fair it’s 31% IPC + node which is doable but now that is double the SC power using your ^4 equation. I dunno
1) Initially I was writing clock speed + IPC + process (aka node), but I realized if we're calculating performance improvement, it's just clock speed & IPC, right? I.e., for a given clock speed and architecture, changing the process wouldn't provide any speed improvement....what a smaller process does is to allow higher clocks at the same power; and also room for more transistors and thus fancier architecture, which could in turn lead to a higher IPC. I.e., the smaller process enables higher clocks and IPC, but doesn't itself give faster performance for a given clock and IPC (and yes, it also allows room for fancier coprocessors that could also speed task completion, but that improvement is, like IPC, also due to the improved architecture, not the process per se). Thus counting both process and IPC is essentially "double counting". Or am I wrong?

2) My ^4 equation is (hopefully) an overly-generous upper bound for the additional power required.
 
1) Initially I was writing clock speed + IPC + process (aka node), but I realized if we're calculating performance improvement, it's just clock speed & IPC, right? I.e., for a given clock speed and architecture, changing the process wouldn't provide any speed improvement....what a smaller process does is to allow higher clocks at the same power; and also room for more transistors and thus fancier architecture, which could in turn lead to a higher IPC. I.e., the smaller process enables higher clocks and IPC, but doesn't itself give faster performance for a given clock and IPC (and yes, it also allows room for fancier coprocessors that could also speed task completion, but that improvement is, like IPC, also due to the improved architecture, not the process per se). Thus counting both process and IPC is essentially "double counting". Or am I wrong?
I guess it depends. For a given clock speed you’re right. But the 4.2 GHz was chosen due to the rumor that the M2 Ultra Mac Pro would already be clocked as such. Then you could argue with superior process node the M3 Ultra Mac Pro can be clocked higher with no extra increase in power/heat which in the end is the factor you most want to limit. Thus the remaining 31% could be a mixture of higher IPC and even higher clocks enabled by the process node.

2) My ^4 equation is (hopefully) an overly-generous upper bound for the additional power required.

I defer to those more knowledgeable. But power/heat is the key limiting factor here - provided of course that Apple’s core/circuit design allows for such increases stably which I know @Cmaier has surmised was a key differentiator between M1 and M2.

Edit: While an M3 Ultra/Extreme SC score may indeed reach such heights since we don’t even know yet what the corresponding M2 chip will score yet, I still find this “leak” about a massively upclocked A17 from MaxTech dubious.
 
Last edited:
I guess it depends. For a given clock speed you’re right. But the 4.2 GHz was chosen due to the rumor that the M2 Ultra Mac Pro would already be clocked as such. Then you could argue with superior process node the M3 Ultra Mac Pro can be clocked higher with no extra increase in power/heat which in the end is the factor you most want to limit. Thus the remaining 31% could be a mixture of higher IPC and even higher clocks enabled by the process node.
But that wasn't the argument I was making. I was trying to say how much extra % improvement, beyond that due to a clock speed increase from 3.66 to 4.2 GHz, you'd need to get that 3986 score. I was the one who chose the 4.2 GHz figure as a plausible target for the M3, because it was rumored to already have been tested on the M2. I was left with ≈30%. The only way you could get that extra 30% would be from architectural improvements*. Again, for a given architecture and clock speed, I don't see how improving the process could change preformance. All a process improvement does for performance is to facilitate changes in what actually determines performance, namely the architecture you have and how fast you run it. [*Yes you'd need to make sure that other stuff (memory, etc.), isn't a bottleneck.]

I.e., I concluded it's inconsistent to write the improvement is due to a combination of clock speed & (IPC + process). If you want to include process, then you'd need to say that improvement is due to a combination of (clock speed + process) & (IPC + process), since process helps both. But no one writes (clock speed + process), because it's understood that clock speed already subsumes what the process enables. It's the same thing with IPC. So, logically, it should be clock speed & IPC, not clock speed & (IPC + process).

Of course, I could be wrong, but to be wrong it would have to be the case that, for a given architecture and clock speed, a change in process by itself would increase computational speed. I don't believe that's the case, but I'd be interested to learn otherwise.
 
Then you could argue with superior process node the M3 Ultra Mac Pro can be clocked higher with no extra increase in power/heat

Power = CV^2f + Leakage

If you do absolutely nothing to the design other than shrink all the transistors and wires by exactly the same scale factor, C goes down, V may go down, and leakage goes up. Leakage varies as a percentage of the overall power, but let’s say it’s around 20% of the total power consumption.

We know that N3 does not scale uniformly because they say that SRAM doesn’t scale as well as logic. That tells me that some design rules are not scaling the same as others - I don’t know if it’s different spacing rules on different layers, minimum area rules, or what.

We also know that leakage should go down a LOT. And C will go down, for sure, even if it’s not by the scale factor.

To me, the new transistor architecture that reduces leakage a lot is likely to be the biggest gain here. That let’s apple add more C (e.g. by adding a lot more transistors and wires) or increase f, or both. And the gain will be more than a normal shrink, because a normal shrink increases leakage whereas this shrink will decrease it (I believe).
 
That comes from a MaxTech video. I watched it (on double speed, so I wouldn't have to waste too much time :D ). MaxTech's source claimed Apple got that score by experimenting with high clocks, resulting in too much heat to be usable in an iPhone (but which would perhaps be OK for a Mac). MaxTech claims the source is credible but, of course, who knows....
As fascinating as this discussion may be, the origin for this leak is an anonymous source that sent Vadim these screenshots on Twitter. Max Tech did absolutely no background check or verification.

The original Max Tech video title was: "Prototype A17 Chip Leaks: Massive Performance!"
An hour late, after Vadim got called out on it: "Prototype A17 Chip Leaks: I was fooled! (Don't Watch!)"
When they realized that a title like that would hurt their business: "Prototype A17 Chip Leaks: I was fooled! (REAL OR FAKE?)"

Max Tech had three titles within about four hours. Even though the video was obvious nonsense, they've received 57K hits since then, well above the average for most of their videos. They've also got us, and every other Apple rumor site, chatting about it, despite being a forgery.

How do we know that these are fake? These screenshots were from an alleged A17 running on a prototype iPhone. The font used in these leaked screenshots is almost certainly Roboto, which is made by Google, and commonly used in Android. It's not just a fake, but a bad one, at that.
 
But that wasn't the argument I was making. I was trying to say how much extra % improvement, beyond that due to a clock speed increase from 3.66 to 4.2 GHz, you'd need to get that 3986 score. I was the one who chose the 4.2 GHz figure as a plausible target for the M3, because it was rumored to already have been tested on the M2. I was left with ≈30%. The only way you could get that extra 30% would be from architectural improvements*. Again, for a given architecture and clock speed, I don't see how improving the process could change preformance. All a process improvement does for performance is to facilitate changes in what actually determines performance, namely the architecture you have and how fast you run it. [*Yes you'd need to make sure that other stuff (memory, etc.), isn't a bottleneck.]

I.e., I concluded it's inconsistent to write the improvement is due to a combination of clock speed & (IPC + process). If you want to include process, then you'd need to say that improvement is due to a combination of (clock speed + process) & (IPC + process), since process helps both. But no one writes (clock speed + process), because it's understood that clock speed already subsumes what the process enables. It's the same thing with IPC. So, logically, it should be clock speed & IPC, not clock speed & (IPC + process).

Of course, I could be wrong, but to be wrong it would have to be the case that, for a given architecture and clock speed, a change in process by itself would increase computational speed. I don't believe that's the case, but I'd be interested to learn otherwise.
Yes the better node means that you can increase clock speed again and that’s how you get more speed from process. If 4.2 GHz is what you can achieve on a worse node, you can achieve higher on a better node. The M3 variant of this hypothetical super chip can be clocked higher.
The problem is we’re dealing with a hypothetical successor chip to a hypothetical chip. If the M2 Ultra Mac Pro chip were real and had the specs you proposed, then it would feel fine to propose an M3 variant with a 15% generational IPC uplift and a 15% clock speed boost. Those would be entirely plausible numbers. But it feels weird because you’re already proposing a clock speed increase that doesn’t exist yet for an M2 chip that doesn’t exist so you’re resisting adding more clock speed for the M3 version that doesn’t yet exist :) despite that being one of the primary advantages of moving to a new node (or lower power for the same clock speed but this is the Mac Pro, so not the directions they’d go in).

Edit: Here’s another way to think about it. You’re right that it’s inconsistent to write that improvement is based on higher clock speed +IPC/node. Here’s how I would write it:

Improvement is higher clock speed (more power/better node) + better IPC. You’ve posited that they’ll already increase core power to achieve a higher clock in the M2 Mac Pro chip. Now they can, without increasing core power, increase clock speeds again for the M3 variant by virtue of manufacturing it on a better node.

As fascinating as this discussion may be, the origin for this leak is an anonymous source that sent Vadim these screenshots on Twitter. Max Tech did absolutely no background check or verification.

The original Max Tech video title was: "Prototype A17 Chip Leaks: Massive Performance!"
An hour late, after Vadim got called out on it: "Prototype A17 Chip Leaks: I was fooled! (Don't Watch!)"
When they realized that a title like that would hurt their business: "Prototype A17 Chip Leaks: I was fooled! (REAL OR FAKE?)"

Max Tech had three titles within about four hours. Even though the video was obvious nonsense, they've received 57K hits since then, well above the average for most of their videos. They've also got us, and every other Apple rumor site, chatting about it, despite being a forgery.

How do we know that these are fake? These screenshots were from an alleged A17 running on a prototype iPhone. The font used in these leaked screenshots is almost certainly Roboto, which is made by Google, and commonly used in Android. It's not just a fake, but a bad one, at that.

Yeah that’s what I figured
 
Last edited:
To me, the new transistor architecture that reduces leakage a lot is likely to be the biggest gain here.
What exactly is different? "FinFlex" is still FinFET, just with gate-by-gate control over the 2-1 / 2-2 / 3-2 configuration. It is Samsung that is attempting GAAFET on N3.
 
Back
Top