M3 core counts and performance

Cmaier · May 15, 2024

tomO2013 said:
Those of you lucky enough to have your hands on the iPad Pro M4 can you tell me if the screen quality improvements are significantly better relative to older M1/M2 iPad Pro mini-led screens?

In my opinion, not really. Primarily I notice a difference when viewing movies with light on-dark, like the star wars crawling opening text. No blooming!

Color rendition seems more or less the same. Brightness during the day didn’t seem any different to me -I literally had both ipads on my kitchen table for a few hours while I was setting them up. I was also working on my mac. A few times when I looked up from my mac I had trouble telling the two ipads apart, and had to look at the keyboards to remind myself which was which. Brightness at night, weirdly enough, seems much brighter (at max) on the new one, but, of course, max brightness at night isn’t all that useful.

Keep in mind my vision is not what it was even 5 years ago, so I could be missing something that someone with better vision might appreciate.

leman · May 15, 2024

tomO2013 said:
However the bit that caught my attention was when you mentioned that because Apple doesn’t sell to the open market and only buys for themselves, that this puts Apple at an advantage paradoxically.

Do you mind walking me through your thought process on this one as to why you think this puts Apple at an advantage, I’d like to understand a little more?
I’d have thought that having higher volume orders in general (even for mixed node orders) would give Apples competitors a cost advantage as they would have economies of scale on their side relative to the low volume orders of say Apple etc…

Thanks for initiating this conversation, it's an interesting one!

My argument is very simple: if making high-performance chips is going to become more expensive in the future, Apple is in advantage because they cut out the middleman. They can easily afford spending hundreds of $$$ per chip on design and manufacture, whereas Intel and AMD need to sell chips with profits. Another factor is that Apple devices are expensive, and users buy them. Nobody bats an eye anymore at a $3000-4000 MacBook Pro. A Dell XPS at that price level won't sell. Not to mention that the nature of the business is very different. Most of the chips Intel sells are low-performance, budged CPUs. Most of the chips Intel makes money with are high-end, enthusiast or server CPUs which are sold at tremendous margin. This is a difficult business model to balance, and this is why Intel uses binning so aggressively, selling the same CPU dies at very different price points. That is very unlike Apple, which value performance consistency across their CPUs.

Regarding volumes, I think you might be underestimating how much chips Apple actually produces. Intel has reportedly shipped around 50 million CPUs in 2023. Nvidia and AMD have shipped around 70 million GPUs to the customers. But Apple has shipped over 200 million (!!!) in iPhones alone. It is no surprise that they routinely book entire production capacity of TSMC. So I see no financial disadvantage on the count of low volume orders. If anything, they have to apply very careful production management to make sure there is enough production to satisfy their needs (which is likely the main reason why we only see M4 on the iPad now).

And finally, new developments in SoC packaging can enable performance improvements that Apple with their deep pockets can easily access, and which might be less economically viable for other companies. For example, Apple currently has a disadvantage in GPU performance, since faster GPUs require larger dies. Nvidia can afford making a huge die filled with GPU compute logic. But Apple also needs to fit the CPU, I/O and other IP blocks. As die size approaches reticle limit and costs increase, this imposes a hard boundary of what Apple can do. However, if they split the IP blocks into multiple dies and stack them together on a single package, they can break past that limit. I have little doubt that Apple will be the first one to build a high-performance 3D stacked SoC. They have been working on that technology for years, their patents in that field are the most advanced, and they closely work with TSMC to make this happen. I think we will see chips like these in a few years.

casperes1996 · May 16, 2024

Cmaier said:
I was going to report on this. My M2 ipad pro definitely gets warm and toasty. So far the M4, doing the same mixed workload, is always cool to the touch.

Huh; I would've expected it to be hotter externally to facilitate it being cooler internally. A very thin device with a copper Apple logo and more cooling system to spread the heat across the chassis of the device. I was under the impression that the way they improved cooling was to spread the heat more throughout the exterior of the device's body and frame.
I wouldn't be surprised if it can also just operate at lower heat levels, but when asked o work hard I would get the total surface area would get warmer though perhaps less concentrated than M2.
My A12X iPad Pro mainly gets warm below the camera when I use it intensively. Opposite site barely heats up at all

Yoused · May 21, 2024

mr_roboto said:
A good analogy: if you could build an entire modern OS in hand optimized assembly code, it would probably be a lot faster than the alternative

On that point specifically, I will respectfully disagree. I recall using THINK Pascal in the early 90s, looking at the object code and saying geez. There were two lines of code where one value was modified and then used in the second line: the machine code modified the value, stored it to memory, then reloaded it for the next line. Ridiculous. Then I looked at the way GCC worked in XCode and saw that you could write incredibly verbose code and then not be able to examine the value of one of the variables because the compiler decided it was unnecessary so it was never even created in the object code.

Compilers have become really, really good. I am not convinced that you could hand-optimize the output code for smaller, better performance for more than about 5% of what you were doing. You would be wasting your time trying to tweak everything the way compilers are these days.

Which is not to say that the same applies to circuit design. But, at some point, it could. Good quality ML fuzzy logic might be able to save engineers a lot of effort.

Artemis · May 21, 2024

Yoused said:
On that point specifically, I will respectfully disagree. I recall using THINK Pascal in the early 90s, looking at the object code and saying geez. There were two lines of code where one value was modified and then used in the second line: the machine code modified the value, stored it to memory, then reloaded it for the next line. Ridiculous. Then I looked at the way GCC worked in XCode and saw that you could write incredibly verbose code and then not be able to examine the value of one of the variables because the compiler decided it was unnecessary so it was never even created in the object code.

Compilers have become really, really good. I am not convinced that you could hand-optimize the output code for smaller, better performance for more than about 5% of what you were doing. You would be wasting your time trying to tweak everything the way compilers are these days.

Which is not to say that the same applies to circuit design. But, at some point, it could. Good quality ML fuzzy logic might be able to save engineers a lot of effort.

Agree with this

Cmaier · May 21, 2024

Yoused said:
On that point specifically, I will respectfully disagree. I recall using THINK Pascal in the early 90s, looking at the object code and saying geez. There were two lines of code where one value was modified and then used in the second line: the machine code modified the value, stored it to memory, then reloaded it for the next line. Ridiculous. Then I looked at the way GCC worked in XCode and saw that you could write incredibly verbose code and then not be able to examine the value of one of the variables because the compiler decided it was unnecessary so it was never even created in the object code.

Compilers have become really, really good. I am not convinced that you could hand-optimize the output code for smaller, better performance for more than about 5% of what you were doing. You would be wasting your time trying to tweak everything the way compilers are these days.

Which is not to say that the same applies to circuit design. But, at some point, it could. Good quality ML fuzzy logic might be able to save engineers a lot of effort.

I don’t know how good the tools are today, but when I was in charge of EDA we did some experiments and had the cadence and synopsis and other tool providers each have their own employees design a specific block of ours using their tools. We then compared that to our own design, done by a mid-level engineer using a combination of in-house and commercial tools that we modified and bent to our will to essentially allow hand-design with near instant feedback on the effect of design choices. We tried this several times over the years. Every single time, the block that they provided was 20% worse. It was 20% too big, or 20% too slow, or consumed 20% too much power, or some combination of those. That didn’t even include other problems like static design rule violations that they didn’t check for.

The fully automated tools may be better now, but Maier’s Law says it will always be 20%, because the “hand assist” tools also improve at the same rate.

Artemis · May 22, 2024

From a Geekerwan post on M4. Look at how low that power can go for MT (thanks to scaling the E cores and all).

Artemis · May 22, 2024

Basically matching the M1 MT at half the watts. Probably less judging by thr MT powers here. Just crazy.

Nycturne · May 22, 2024

Cmaier said:
I was going to report on this. My M2 ipad pro definitely gets warm and toasty. So far the M4, doing the same mixed workload, is always cool to the touch.

This might get me to reconsider an M4 iPad Pro. One reason I stopped using it with the pencil was how warm the thing gets while trying to use it for handwritten notes.

amonduin · May 22, 2024

Nycturne said:
This might get me to reconsider an M4 iPad Pro. One reason I stopped using it with the pencil was how warm the thing gets while trying to use it for handwritten notes.

This is actually something I've noticed with my 2018 iPP when using it in bed while taking handwritten notes.

Jimmyjames · May 22, 2024

Not sure whether to post this here or in the pinned iPad event thread.

I noticed something in the Geekerwan M4 video that highlights the problems with benchmark scores and the (understandable) lack of information around them in reviews. To be clear, I don’t think there is any bad intention or laziness here. It’s just really difficult to benchmark accurately.

Here is Geekerwan result for the M4 iPad Spec benchmark. Pay attention to the Spec Int score for the M3:
M3 P core at 4.01ghz = 9.86

Now here is the result for the M3 in a widely re-retweeted benchmark of many cpus using Spec Int:
M3 P core, 4.05 ghz = 11.8

Im quite sure that the score here are the result of compilation options. I cant imagine the small gap in frequency would result in such a difference. Without knowing these details though, it’s hard to treat these scores as more than entertainment.

I’d be interested in opinions.

amonduin · May 22, 2024

Jimmyjames said:
Not sure whether to post this here or in the pinned iPad event thread.

I noticed something in the Geekerwan M4 video that highlights the problems with benchmark scores and the (understandable) lack of information around them in reviews. To be clear, I don’t think there is any bad intention or laziness here. It’s just really difficult to benchmark accurately.

Here is Geekerwan result for the M4 iPad Spec benchmark. Pay attention to the Spec Int score for the M3:
M3 P core at 4.01ghz = 9.86
View attachment 29585
Now here is the result for the M3 in a widely re-retweeted benchmark of many cpus using Spec Int:
M3 P core, 4.05 ghz = 11.8
View attachment 29586

Im quite sure that the score here are the result of compilation options. I cant imagine the small gap in frequency would result in such a difference. Without knowing these details though, it’s hard to treat these scores as more than entertainment.

I’d be interested in opinions.

Given that Geekerwan probably compiles both the same way I would assume that the same compiler options should lead to similar gaps even if the floor of the M3 is raised?

Jimmyjames · May 22, 2024

amonduin said:
Given that Geekerwan probably compiles both the same way I would assume that the same compiler options should lead to similar gaps even if the floor of the M3 is raised?

I would hope! I think it reveals the problems with benchmarks though.

Cmaier · May 22, 2024

Nycturne said:
This might get me to reconsider an M4 iPad Pro. One reason I stopped using it with the pencil was how warm the thing gets while trying to use it for handwritten notes.

I just got my M1 ipad pro replacement and spent some time using them side by side again. Old ipad pro definitely gets noticeably warm, and M4 does not, at all. (also: after a week now, i continue to maintain that the pencil pro is fantastic. I use pencil an awful lot, despite my left brain brain.)

mr_roboto · May 23, 2024

Jimmyjames said:
Im quite sure that the score here are the result of compilation options. I cant imagine the small gap in frequency would result in such a difference. Without knowing these details though, it’s hard to treat these scores as more than entertainment.

I’d be interested in opinions.

Yes, SPEC scores can swing quite a lot based on compiler flags. In some subtests, organizations who want to post high scores also pull tricks like linking the binary against a different implementation of malloc()/free() than the standard system one. A program which spends a lot of time allocating and freeing memory can benefit a lot from a tuned allocator.

IIRC, AnandTech is pretty good about this - they decide on a standard compiler and a set of not-so-aggressive compiler flags, use standard system-provided libraries, and disclose what they do. You can sensibly compare AT SPEC scores to AT SPEC scores. Cross-comparisons with Geekerwan, though? Not likely to be meaningful.

You always have to remember that SPEC was a product of the 1990s UNIX wars. At the start of that decade, everyone serious about building UNIX workstations and servers had their own proprietary (or nearly so) RISC ISA, their own proprietary compiler, and their own proprietary UNIX derivative. That's why there's such a focus on compilers and compiler flags in SPEC: lots of the people interested in comparing alternative UNIX platforms had in-house C or Fortran applications, and they needed to guesstimate the effects of porting their own code to a different proprietary UNIX flavor with a different ISA, compiler, and system libraries.

Today there is much less effective diversity in compilers and CPU ISAs. By "effective", I mean that there are probably more choices out there than ever before, but in usage terms things have converged on just two important ISAs (and let's face it, we were down to just one for a bit there!) and a slightly larger number of compilers, most of which are open source. Furthermore, personal computing has truly spread out to the masses, who just run binaries provided by someone else because they don't know how to compile their own code (nor should they). So there's a lot of ways in which SPEC is a bit archaic.

Geekbench seems to be an attempt at making a SPEC-inspired benchmark suite more in tune with the times. It's provided in compiled binary form only, and its choices of benchmarks shift focus towards the kind of compute intensive tasks average users do with their computers and phones today. It's still got some technical UNIX workstation roots, e.g. it measures compiler performance, but it also measures things like analyzing your photo library to detect pets or faces.

NotEntirelyConfused · Jun 4, 2024

Intel announced at Computex today that Lunar Lake's compute chiplet is being fabbed on TSMC N3B. This is interesting for several reasons, but the one that stands out for me is that this will give us Intel's latest and greatest core, versus Apple's not-quite latest (M3) core, on the same exact process. Comparisons of PPA will be much more informative!

Cmaier · Jun 5, 2024

NotEntirelyConfused said:
Intel announced at Computex today that Lunar Lake's compute chiplet is being fabbed on TSMC N3B. This is interesting for several reasons, but the one that stands out for me is that this will give us Intel's latest and greatest core, versus Apple's not-quite latest (M3) core, on the same exact process. Comparisons of PPA will be much more informative!

I’m very worried about the industry now that nobody can keep up with TSMC. Even when Intel had the best fabs, IBM and AMD were within spitting distance (and IBM generally had better transistors than Intel, and we had better interconnect than Intel, for most of that time). If intel really had faith they were going to catch up, they wouldn’t do this TSMC stuff.

dada_dave · Jun 5, 2024

Cmaier said:
I’m very worried about the industry now that nobody can keep up with TSMC. Even when Intel had the best fabs, IBM and AMD were within spitting distance (and IBM generally had better transistors than Intel, and we had better interconnect than Intel, for most of that time). If intel really had faith they were going to catch up, they wouldn’t do this TSMC stuff.

Intel claims that they’ll be completely on their own, naturally better than TSMC, process next year*, that this is a one time thing - and their upcoming desktop Arrow Lake CPUs will be shipping on Intel’s own process.

*Next year’s Panther Lake, the Lunar Lake replacement will be on a 1.8 (see better than N2! It’s 1.8!) process and will have powervia (backside power) first. What’s actually open for 3rd party foundry is less obvious, to me anyway.

Cmaier · Jun 5, 2024

dada_dave said:
Intel claims that they’ll be completely on their own, naturally better than TSMC, process next year*, that this is a one time thing - and their upcoming desktop Arrow Lake CPUs will be shipping on Intel’s own process.

*Next year’s Panther Lake, the Lunar Lake replacement will be on a 1.8 (see better than N2! It’s 1.8!) process and will have powervia (backside power) first. What’s actually open for 3rd party foundry is less obvious, to me anyway.

they claim lots of things. But why would you invest hundreds of millions of dollars into a ”one time thing” where you can’t leverage it for future projects. I don’t believe them.

NotEntirelyConfused · Jun 5, 2024

Cmaier said:
they claim lots of things. But why would you invest hundreds of millions of dollars into a ”one time thing” where you can’t leverage it for future projects. I don’t believe them.

Your worry is plausible but there are other possible ways to look at this.

If I were Pat Gelsinger and I wanted to light a real fire under my foundry people's asses, this is exactly what I'd do. Give them a real reason to worry about their jobs *today*, not in some vaguely-discernable future that's years off because the current processes are good enough to keep Intel's market share from withering too rapidly.

Alternatively, if I had confidence that my foundry people *were* on the right track, but needed a little more time, I might do this to let my core architects off the leash. Allow them to put their best foot forward, so that even if I don't have the process crown right now, I can crow about how good our designs are.

I'm not especially bullish on Intel. I think their future is uncertain (they might be turning things around... or not, it's too soon to tell). But I dont think this tells us much one way or the other.

M3 core counts and performance

Site Master

Elite Member

Site Champ

up

Site Champ

Site Master

Site Champ

Site Champ

Elite Member

Active member

Elite Member

Active member

Elite Member

Site Master

Site Champ

Power User

Site Master

Elite Member

Site Master

Power User

Similar threads