M3 core counts and performance

exoticspice1 · May 15, 2024

I am very pleased with the thermal improvement on the iPad Pro. I want Apple to focus on cooling on iPhones and Macbooks going forward.

No one likes hot and noisy laptops, see M3 Max 14". More copper heatpipes please.

Jimmyjames · May 15, 2024

exoticspice1 said:
Loading…

x.com

gpu test wle 3d mark

Thanks. Seems more stable. Perhaps the copper pad they introduced?

leman · May 15, 2024

Jimmyjames said:
I thought they said the A17 pulled 14 watts single core? I may have misremembered.

That was multi core if I remember correctly.

dada_dave said:
Sure unless you average the test results in which case it's about 5% "IPC gains" or so ... which people tend to like numbers above 10, high teens especially (and remember SME doesn't count!).

It's easy to get double-digit IPC improvements if your IPC is in the cellar. I really get confused when everyone is praising AMD for achieving 10% higher IPC while Apple is stuck with 3-5% improvements... like dude, M4 has around 60-70% higher IPC than Zen4, what are you talking about?

dada_dave said:
Are you roger_k over there?

Maybe

Jimmyjames said:
I have to say I am not fond of this extreme cooling that they seem to be doing lately. As far as I can see, it skews results both in terms of the score, and the power. I just want analysis of the M4 at the default power levels. It feels like a move aimed at increasing clicks in YouTube, which is fair if that’s their business, but it’s a shame there is seemingly little room for sober analysis.

There is an advantage to this in that you can fix the clock. Especially if you want to estimate the IPC it is really important to understand whether you are running at peak clock or not. Every Mhz matters. But the danger is in extrapolating their results to normal usage, which they unfortunately do. "Oh it uses more power, that's bad!" — of course it's using more power, you are forcing it into an unrealistic situation. Measure the power consumption under normal use.

thenewperson · May 15, 2024

Jimmyjames said:
Also, why do so many seem underwhelmed in that thread? Are they wrong or am I missing something?

Well, aside from the complaining that Apple isn't giving the same gains they used to, that place has the distinct problem of taking rumoured AMD gains too seriously. So we have people looking at Apple's 'measly' IPC gains compared to AMD's rumoured 30 - 40% IPC gains and act accordingly. They also really don't like GB6 there.

Jimmyjames · May 15, 2024

thenewperson said:
Well, aside from the complaining that Apple isn't giving the same gains they used to, that place has the distinct problem of taking rumoured AMD gains too seriously. So we have people looking at Apple's 'measly' IPC gains compared to AMD's rumoured 30 - 40% IPC gains and act accordingly. They also really don't like GB6 there.

I know @leman has documented the fact that each generation of Apple Silicon tends to increase by about 200-400 each time, but does anyone know roughly what the ipc increases have been?

Artemis · May 15, 2024

Jimmyjames said:
I have to say I am not fond of this extreme cooling that they seem to be doing lately. As far as I can see, it skews results both in terms of the score, and the power. I just want analysis of the M4 at the default power levels. It feels like a move aimed at increasing clicks in YouTube, which is fair if that’s their business, but it’s a shame there is seemingly little room for sober analysis.

I think it’s useful insofar as it shows what the chip can do in principle but if it’s pushing it past what a laptop could do, then yeah it’s dumb.

dada_dave said:
Yup, people value gains, recent gains. And yeah we should see what actually gets released of course. AMD and Intel have a long, long way to go. Intel more so. Qualcomm is starting off much closer, because well of course they are, and we'll see what their V2 cores are like. ARM can conceivably speed up with the X5. But even if any of these eventually "catch" Apple as it were, so far there's no evidence of a chip design team that is actually going to do to Apple what Apple did Intel/AMD. This may age poorly, but that's because there is no evidence to believe otherwise yet. If such evidence comes up, I'll happily change my mind!

Intel and AMD will get closer on IPC, but not really power. And even with Arm’s IP, we’ve seen a struggle in this vein, despite being *vastly* closer in achieving good high IPC & efficient designs than like AMD and Intel have.

For ex: The X4 is near A14-caliber on IPC and they still can’t get power as low for the same performance. N3E would help for the X5 but the X5 and their caches will have to get power in order to really compete in parallel to some IPC gains.

Intel and AMD being ahead on ST isn’t really a big deal, I expect that to be normal now that they’re getting more competitive but still jacking clocks as much as they can

ASUSTeK COMPUTER INC. ASUS Zenbook 14 UX3405MA_UX3405MA - Geekbench

Benchmark results for an ASUSTeK COMPUTER INC. ASUS Zenbook 14 UX3405MA_UX3405MA with an Intel Core Ultra 7 155H processor.

browser.geekbench.com

Geekbench Browser

A current intel meteor lake 155H part is hitting like 2356 at 4.75GHz. = 496 Perf/GHz.

Lunar Lake, Intel’s “low power” M-class chip for 10-30W stuff that’s coming has been leaked to hit 4.9GHz on N3B. With N3B that’s a bit high. But even with a 20% IPC boost — and that’s possibly generous apparently — they’ll be hitting 2,916 GB6 at 4.9GHz, and there’s no guarantee power is actually great, I doubt it will be M3-caliber at that performance. It’s also just as big as the M3 in terms of area, which frankly is just sad.

Cmaier · May 15, 2024

exoticspice1 said:
I am very pleased with the thermal improvement on the iPad Pro. I want Apple to focus on cooling on iPhones and Macbooks going forward.

No one likes hot and noisy laptops, see M3 Max 14". More copper heatpipes please.

I was going to report on this. My M2 ipad pro definitely gets warm and toasty. So far the M4, doing the same mixed workload, is always cool to the touch.

Artemis · May 15, 2024

One thing lost in all the whining and boasting: the E Cores have still seen substantial gains in clocks and architecture both since the A14. Power has gone up but it’s in absolute so low that it’s hard to complain about a doubling since A13/A14 — .6-.8W now is no big deal. Efficiency has still massively improved. They’re fantastic, I understand why Apple went for 4+6 in the M4.

dada_dave · May 15, 2024

Artemis said:
It’s also just as big as the M3 in terms of area, which frankly is just sad.

wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm. Or were you referring to something else by size? Like chip size?

Artemis · May 15, 2024

dada_dave said:
wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm.

Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.

With that, since they’re using the core in
Arrow Lake on N3B and 20A (mobile and desktop stuff and higher clocks more cores)
Lunar Lake on N3B — which is a 4+4 re-engineered part designed to reduce idle power draw and get more towards what QC and Apple can do for battery life, then really they they should try to make it denser but I guess they really need/want the extra clocks and the yields for it, so it’s like a 4.5-4.8mm^2 core is the rumor from a die thing.

That’s like twice the size of Apple’s logic/L1-only coll core in A17 or M3.

Cmaier · May 15, 2024

Artemis said:
Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.

If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.

dada_dave · May 15, 2024

Artemis said:
Ya but that was on a terrible node density wise. And two other things are:

Lion Cove is synthesizable across foundries finally, because it’s being used on N3B and 20A both.

With that, since they’re using the core in
Arrow Lake on N3B and 20A (mobile and desktop stuff and higher clocks more cores)
Lunar Lake on N3B — which is a 4+4 re-engineered part designed to reduce idle power draw and get more towards what QC and Apple can do for battery life, then really they they should try to make it denser but I guess they really need/want the extra clocks and the yields for it, so it’s like a 4.5-4.8mm^2 core is the rumor from a die thing.

That’s like twice the size of Apple’s logic/L1-only coll core in A17 or M3.

Cmaier said:
If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.

What does synthesize mean in this context and why is it bad?

Artemis · May 15, 2024

dada_dave said:
wouldn’t that actually be really good for them? I admit the last time I looked at core sizes was back in firestorm era and Intel’s cores were simply massive comparatively. Like 3x the die area of a firestorm. Or were you referring to something else by size? Like chip size?

Oh the full Lunar Lake die size itself isn’t impressive either in light of what we think we know about Lion Cove and the overall thing. Basically, 140mm^2 for the main die that has the cores, GPU, and other stuff.

https://x.com/_wildc/status/1745244534978592784?s=46

M3 is speculated to be 135 to 150mm^2, probably on the lower end and reportedly ~~ same size ish.

Yes, M4 is out, before someone says that, I know.
But the point here is you’re going to get the chance of a lifetime to see Intel’s glorious architecture and design on a nearly perfect comparison with an Apple part. It’s almost entirely an architectural comparison. Similar node, similar die size (slightly more in favor of Intel due to the IO tile on N6), similar purpose - a 4+4 part for low power, no hyperthreading, etc. It even has an 8MB system cache just like M3.

Artemis · May 15, 2024

Cmaier said:
If they’re synthesizing, then please allow me to make news by predicting how bad they are going to suck, like I did for Bulldozer.

lol

Yeah apparently they’re trying to get off their over-reliance on bespoke intel foundry-only EDA, and doing hand layouts. My understanding is this is supposed to be a good thing, and Intel becoming modern, but we’ll see if they blow it.

Cliff — is synthesizable bad? Apparently a reason in the past Intel’s cores have been so big is refusing to use modern EDA and doing way too much stuff manually, but now this is changing because they’re using TSMC for a core, and because the foundry is trying to get up to standard for future customers too.

Artemis · May 15, 2024

But idk, maybe there’s a time and a place for this and this will backfire!

Cmaier · May 15, 2024

dada_dave said:
What does synthesize mean in this context and why is it bad?

There are lots of ways to design chips, and almost everyone uses synthesis. But not for CPUs. For CPUs, almost everyone uses synthesis, but the people who design the best chips tend to use it for as little as they can get away with.

The idea is that your architects write a description of the chip’s logical behavior in a synthesizable language (typically verilog, but other possibilities include VHDL, various C++ or C varieties, etc.)

So they may say something like:

A[8:0] = B[8:0] + C[8:0]

to create an 8-bit adder. Then, with synthesis, your logic designers would plunk all that into a “synthesis tool” (almost always from a company called Synopsis) to dump out a netlist (which lists the logic gates to be used and how they are wired together).

Then your physical designer takes that netlist and lays it out to create the chip mask. (Typically using automated place-and-route tools, often from a company called Cadence)

That’s not at all how I ever did it at any of the three companies I worked at.

We always “hand-synthesized” the logic. And hand placed the gates. And hand-routed at least some of the wires.

Because we were real men (even the women!)

I mean, one time I literally loaded the mask into vi and edited two wires on the day of tapeout. Did I mention I’m very manly?

Artemis · May 15, 2024

Cmaier said:
There are lots of ways to design chips, and almost everyone uses synthesis. But not for CPUs. For CPUs, almost everyone uses synthesis, but the people who design the best chips tend to use it for as little as they can get away with.

The idea is that your architects write a description of the chip’s logical behavior in a synthesizable language (typically verilog, but other possibilities include VHDL, various C++ or C varieties, etc.)

So they may say something like:

A[8:0] = B[8:0] + C[8:0]

to create an 8-bit adder. Then, with synthesis, your logic designers would plunk all that into a “synthesis tool” (almost always from a company called Synopsis) to dump out a netlist (which lists the logic gates to be used and how they are wired together).

Then your physical designer takes that netlist and lays it out to create the chip mask. (Typically using automated place-and-route tools, often from a company called Cadence)

That’s not at all how I ever did it at any of the three companies I worked at.

We always “hand-synthesized” the logic. And hand placed the gates. And hand-routed at least some of the wires.

Because we were real men (even the women!) I mean, one time I literally loaded the mask into vi and edited two wires on the day of takeout. Did I mention I’m very manly?

Makes sense, thanks!

mr_roboto · May 15, 2024

dada_dave said:
What does synthesize mean in this context and why is it bad?

"Synthesis" is machine translation of code written in a high level hardware description language (HDL) like Verilog to a gate netlist. There are many ways you can create chips. A gross oversimplification into just three bins:

Lowest performance / worst area / minimum engineering time: synthesis + automated place & route
Medium perf/etc: synthesis + hand tweaks of gate selections in netlist + some hand layout
Highest perf / min area / max engineering effort: direct engineer gate choices (sometimes by tweaking HDL source to make the synthesizer do what you want), hand selection of most gate drive strengths, lots of manual layout

Basically, the more automation there is in the process, the more you're relying on software which uses healthy safety margins to make sure the end result will function correctly. Humans can almost always do a better optimization job.

A good analogy: if you could build an entire modern OS in hand optimized assembly code, it would probably be a lot faster than the alternative, but is less maintainable and takes way more effort to implement features. The big differences are that chips are projects with much smaller overall system complexity than a big software project, and software tooling is much better at approaching the limits of human ability.

tomO2013 · May 15, 2024

Those of you lucky enough to have your hands on the iPad Pro M4 can you tell me if the screen quality improvements are significantly better relative to older M1/M2 iPad Pro mini-led screens?

tomO2013 · May 15, 2024

Also @leman over at the other place you made a very interesting post that I was half tempted to ask you about there and request you to expand a little more on your thought process (but decided against it and hoped you wouldn’t mind continuing with your thought process over here).
You had mentioned that it is going to become more difficult to squeeze out performance per watt going forward on pure process node reduction alone (I agree) and that different packaging methods (e.g. mixed node - I’m assuming similar to what AMD did with their RDNA3 design chiplet gpus but at a more advanced level of packaging reflective of 2024/2025+ technologies).
However the bit that caught my attention was when you mentioned that because Apple doesn’t sell to the open market and only buys for themselves, that this puts Apple at an advantage paradoxically.

Do you mind walking me through your thought process on this one as to why you think this puts Apple at an advantage, I’d like to understand a little more?
I’d have thought that having higher volume orders in general (even for mixed node orders) would give Apples competitors a cost advantage as they would have economies of scale on their side relative to the low volume orders of say Apple etc…

Do you mind joining the dots on this one for me please - I’m tired and off of a long haul flight so perhaps my brain is stuck in first gear.

Appreciate your time

M3 core counts and performance

Site Champ

Site Champ

Site Champ

Member

Site Champ

Power User

Site Master

Power User

Elite Member

Power User

Site Master

Elite Member

Power User

Power User

Power User

Site Master

Power User

Site Champ

Power User

Power User

Similar threads