M3 core counts and performance

I am very pleased with the thermal improvement on the iPad Pro. I want Apple to focus on cooling on iPhones and Macbooks going forward.

No one likes hot and noisy laptops, see M3 Max 14". More copper heatpipes please.
 
I thought they said the A17 pulled 14 watts single core? I may have misremembered.

That was multi core if I remember correctly.

Sure unless you average the test results in which case it's about 5% "IPC gains" or so ... which people tend to like numbers above 10, high teens especially (and remember SME doesn't count!).

It's easy to get double-digit IPC improvements if your IPC is in the cellar. I really get confused when everyone is praising AMD for achieving 10% higher IPC while Apple is stuck with 3-5% improvements... like dude, M4 has around 60-70% higher IPC than Zen4, what are you talking about?

Are you roger_k over there?

Maybe :D

I have to say I am not fond of this extreme cooling that they seem to be doing lately. As far as I can see, it skews results both in terms of the score, and the power. I just want analysis of the M4 at the default power levels. It feels like a move aimed at increasing clicks in YouTube, which is fair if that’s their business, but it’s a shame there is seemingly little room for sober analysis.

There is an advantage to this in that you can fix the clock. Especially if you want to estimate the IPC it is really important to understand whether you are running at peak clock or not. Every Mhz matters. But the danger is in extrapolating their results to normal usage, which they unfortunately do. "Oh it uses more power, that's bad!" — of course it's using more power, you are forcing it into an unrealistic situation. Measure the power consumption under normal use.
 
Also, why do so many seem underwhelmed in that thread? Are they wrong or am I missing something?
Well, aside from the complaining that Apple isn't giving the same gains they used to, that place has the distinct problem of taking rumoured AMD gains too seriously. So we have people looking at Apple's 'measly' IPC gains compared to AMD's rumoured 30 - 40% IPC gains and act accordingly. They also really don't like GB6 there.
 
Well, aside from the complaining that Apple isn't giving the same gains they used to, that place has the distinct problem of taking rumoured AMD gains too seriously. So we have people looking at Apple's 'measly' IPC gains compared to AMD's rumoured 30 - 40% IPC gains and act accordingly. They also really don't like GB6 there.
I know @leman has documented the fact that each generation of Apple Silicon tends to increase by about 200-400 each time, but does anyone know roughly what the ipc increases have been?
 
I have to say I am not fond of this extreme cooling that they seem to be doing lately. As far as I can see, it skews results both in terms of the score, and the power. I just want analysis of the M4 at the default power levels. It feels like a move aimed at increasing clicks in YouTube, which is fair if that’s their business, but it’s a shame there is seemingly little room for sober analysis.
I think it’s useful insofar as it shows what the chip can do in principle but if it’s pushing it past what a laptop could do, then yeah it’s dumb.
Yup, people value gains, recent gains. And yeah we should see what actually gets released of course. AMD and Intel have a long, long way to go. Intel more so. Qualcomm is starting off much closer, because well of course they are, and we'll see what their V2 cores are like. ARM can conceivably speed up with the X5. But even if any of these eventually "catch" Apple as it were, so far there's no evidence of a chip design team that is actually going to do to Apple what Apple did Intel/AMD. This may age poorly, but that's because there is no evidence to believe otherwise yet. If such evidence comes up, I'll happily change my mind!
Intel and AMD will get closer on IPC, but not really power. And even with Arm’s IP, we’ve seen a struggle in this vein, despite being *vastly* closer in achieving good high IPC & efficient designs than like AMD and Intel have.

For ex: The X4 is near A14-caliber on IPC and they still can’t get power as low for the same performance. N3E would help for the X5 but the X5 and their caches will have to get power in order to really compete in parallel to some IPC gains.

Intel and AMD being ahead on ST isn’t really a big deal, I expect that to be normal now that they’re getting more competitive but still jacking clocks as much as they can


A current intel meteor lake 155H part is hitting like 2356 at 4.75GHz. = 496 Perf/GHz.

Lunar Lake, Intel’s “low power” M-class chip for 10-30W stuff that’s coming has been leaked to hit 4.9GHz on N3B. With N3B that’s a bit high. But even with a 20% IPC boost — and that’s possibly generous apparently — they’ll be hitting 2,916 GB6 at 4.9GHz, and there’s no guarantee power is actually great, I doubt it will be M3-caliber at that performance. It’s also just as big as the M3 in terms of area, which frankly is just sad.
 
I am very pleased with the thermal improvement on the iPad Pro. I want Apple to focus on cooling on iPhones and Macbooks going forward.

No one likes hot and noisy laptops, see M3 Max 14". More copper heatpipes please.

I was going to report on this. My M2 ipad pro definitely gets warm and toasty. So far the M4, doing the same mixed workload, is always cool to the touch.
 
One thing lost in all the whining and boasting: the E Cores have still seen substantial gains in clocks and architecture both since the A14. Power has gone up but it’s in absolute so low that it’s hard to complain about a doubling since A13/A14 — .6-.8W now is no big deal. Efficiency has still massively improved. They’re fantastic, I understand why Apple went for 4+6 in the M4.
 
It’s also just as big as the M3 in terms of area, which frankly is just sad.
wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm. Or were you referring to something else by size? Like chip size?
 
wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm.
Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.

With that, since they’re using the core in
Arrow Lake on N3B and 20A (mobile and desktop stuff and higher clocks more cores)
Lunar Lake on N3B — which is a 4+4 re-engineered part designed to reduce idle power draw and get more towards what QC and Apple can do for battery life, then really they they should try to make it denser but I guess they really need/want the extra clocks and the yields for it, so it’s like a 4.5-4.8mm^2 core is the rumor from a die thing.

That’s like twice the size of Apple’s logic/L1-only coll core in A17 or M3.
 
Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.
If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.
 
Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.

With that, since they’re using the core in
Arrow Lake on N3B and 20A (mobile and desktop stuff and higher clocks more cores)
Lunar Lake on N3B — which is a 4+4 re-engineered part designed to reduce idle power draw and get more towards what QC and Apple can do for battery life, then really they they should try to make it denser but I guess they really need/want the extra clocks and the yields for it, so it’s like a 4.5-4.8mm^2 core is the rumor from a die thing.

That’s like twice the size of Apple’s logic/L1-only coll core in A17 or M3.

If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.
What does synthesize mean in this context and why is it bad?
 
wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm. Or were you referring to something else by size? Like chip size?
Oh the full Lunar Lake die size itself isn’t impressive either in light of what we think we know about Lion Cove and the overall thing. Basically, 140mm^2 for the main die that has the cores, GPU, and other stuff.

M3 is speculated to be 135 to 150mm^2, probably on the lower end and reportedly ~~ same size ish.

Yes, M4 is out, before someone says that, I know.
But the point here is you’re going to get the chance of a lifetime to see Intel’s glorious architecture and design on a nearly perfect comparison with an Apple part. It’s almost entirely an architectural comparison. Similar node, similar die size (slightly more in favor of Intel due to the IO tile on N6), similar purpose - a 4+4 part for low power, no hyperthreading, etc. It even has an 8MB system cache just like M3.
 
If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.
lol

Yeah apparently they’re trying to get off their over-reliance on bespoke intel foundry-only EDA, and doing hand layouts. My understanding is this is supposed to be a good thing, and Intel becoming modern, but we’ll see if they blow it.

Cliff — is synthesizable bad? Apparently a reason in the past Intel’s cores have been so big is refusing to use modern EDA and doing way too much stuff manually, but now this is changing because they’re using TSMC for a core, and because the foundry is trying to get up to standard for future customers too.
 
Last edited:
What does synthesize mean in this context and why is it bad?
There are lots of ways to design chips, and almost everyone uses synthesis. But not for CPUs. For CPUs, almost everyone uses synthesis, but the people who design the best chips tend to use it for as little as they can get away with.

The idea is that your architects write a description of the chip’s logical behavior in a synthesizable language (typically verilog, but other possibilities include VHDL, various C++ or C varieties, etc.)

So they may say something like:

A[8:0] = B[8:0] + C[8:0]

to create an 8-bit adder. Then, with synthesis, your logic designers would plunk all that into a “synthesis tool” (almost always from a company called Synopsis) to dump out a netlist (which lists the logic gates to be used and how they are wired together).

Then your physical designer takes that netlist and lays it out to create the chip mask. (Typically using automated place-and-route tools, often from a company called Cadence)

That’s not at all how I ever did it at any of the three companies I worked at.

We always “hand-synthesized” the logic. And hand placed the gates. And hand-routed at least some of the wires.

Because we were real men (even the women!) :) I mean, one time I literally loaded the mask into vi and edited two wires on the day of tapeout. Did I mention I’m very manly?
 
Last edited:
There are lots of ways to design chips, and almost everyone uses synthesis. But not for CPUs. For CPUs, almost everyone uses synthesis, but the people who design the best chips tend to use it for as little as they can get away with.

The idea is that your architects write a description of the chip’s logical behavior in a synthesizable language (typically verilog, but other possibilities include VHDL, various C++ or C varieties, etc.)

So they may say something like:

A[8:0] = B[8:0] + C[8:0]

to create an 8-bit adder. Then, with synthesis, your logic designers would plunk all that into a “synthesis tool” (almost always from a company called Synopsis) to dump out a netlist (which lists the logic gates to be used and how they are wired together).

Then your physical designer takes that netlist and lays it out to create the chip mask. (Typically using automated place-and-route tools, often from a company called Cadence)

That’s not at all how I ever did it at any of the three companies I worked at.

We always “hand-synthesized” the logic. And hand placed the gates. And hand-routed at least some of the wires.

Because we were real men (even the women!) :) I mean, one time I literally loaded the mask into vi and edited two wires on the day of takeout. Did I mention I’m very manly?
Makes sense, thanks!
 
What does synthesize mean in this context and why is it bad?
"Synthesis" is machine translation of code written in a high level hardware description language (HDL) like Verilog to a gate netlist. There are many ways you can create chips. A gross oversimplification into just three bins:
  • Lowest performance / worst area / minimum engineering time: synthesis + automated place & route
  • Medium perf/etc: synthesis + hand tweaks of gate selections in netlist + some hand layout
  • Highest perf / min area / max engineering effort: direct engineer gate choices (sometimes by tweaking HDL source to make the synthesizer do what you want), hand selection of most gate drive strengths, lots of manual layout
Basically, the more automation there is in the process, the more you're relying on software which uses healthy safety margins to make sure the end result will function correctly. Humans can almost always do a better optimization job.

A good analogy: if you could build an entire modern OS in hand optimized assembly code, it would probably be a lot faster than the alternative, but is less maintainable and takes way more effort to implement features. The big differences are that chips are projects with much smaller overall system complexity than a big software project, and software tooling is much better at approaching the limits of human ability.
 
Those of you lucky enough to have your hands on the iPad Pro M4 can you tell me if the screen quality improvements are significantly better relative to older M1/M2 iPad Pro mini-led screens?
 
Also @leman over at the other place you made a very interesting post that I was half tempted to ask you about there and request you to expand a little more on your thought process (but decided against it and hoped you wouldn’t mind continuing with your thought process over here).
You had mentioned that it is going to become more difficult to squeeze out performance per watt going forward on pure process node reduction alone (I agree) and that different packaging methods (e.g. mixed node - I’m assuming similar to what AMD did with their RDNA3 design chiplet gpus but at a more advanced level of packaging reflective of 2024/2025+ technologies).
However the bit that caught my attention was when you mentioned that because Apple doesn’t sell to the open market and only buys for themselves, that this puts Apple at an advantage paradoxically.

Do you mind walking me through your thought process on this one as to why you think this puts Apple at an advantage, I’d like to understand a little more?
I’d have thought that having higher volume orders in general (even for mixed node orders) would give Apples competitors a cost advantage as they would have economies of scale on their side relative to the low volume orders of say Apple etc…

Do you mind joining the dots on this one for me please - I’m tired and off of a long haul flight so perhaps my brain is stuck in first gear.

Appreciate your time :)
 
Back
Top