Nuvia: don’t hold your breath

With the caveat that there are only just over 100 Geekbench entries for the 8 Elite (OnePlus PJZ110), and I am comparing phone soc and laptop ones, I thought it might be interesting to take a look at some charts showing areas of improvement between the 8 Elite, it’s predecessor and the competition.
Nice work!

But I really have to wonder... if you get to remove Object Detection from the A18P, why not remove Photo Filter from the SD4?

I don't actually want you to do that - my point is that this whole pussyfooting around the butthurt who cry foul because Apple implemented SME is ridiculous. The scores are what the scores are. If QC wants to bench higher on that subtest, they can damn well implement SME too.

It's valuable being able to see subtest scores, as that helps understand what's happening. Removing them though, that's a mug's game. We shouldn't be changing the goalposts.
 
Nice work!

But I really have to wonder... if you get to remove Object Detection from the A18P, why not remove Photo Filter from the SD4?

I don't actually want you to do that - my point is that this whole pussyfooting around the butthurt who cry foul because Apple implemented SME is ridiculous. The scores are what the scores are. If QC wants to bench higher on that subtest, they can damn well implement SME too.

It's valuable being able to see subtest scores, as that helps understand what's happening. Removing them though, that's a mug's game. We shouldn't be changing the goalposts.
I have a great deal of sympathy with this viewpoint. I think in my case I did it both ways just in case people wanted to know. In a situation where we wanted to know what the ipc change is, removing sme has some value probably. Other than that, I agree the device is what it is and should be treated as such.

Edit. I also find Qualcomm testing the 8 Elite vs the iPhone 16 using geekbench 6.2 a bit shady for this reason. 6.3 came out months before the iPhone release. Why should it be tested with an older version?
 
Last edited:
With the caveat that there are only just over 100 Geekbench entries for the 8 Elite (OnePlus PJZ110), and I am comparing phone soc and laptop ones, I thought it might be interesting to take a look at some charts showing areas of improvement between the 8 Elite, it’s predecessor and the competition.

This first chart shows the iso-clock ratio of the 8 Elite vs the 8 Gen 3 (OnePlus PJZ110 vs OnePlus CPH2583. AKA OnePlus 13 vs OnePlus 12)
With Photo Filter on the left and without on the right. No idea why Photo Filter has had such a huge jump. Working out the geomean, it’s about 11% improvement with Photo Filter and about 9% without.

Next comparing the 8 Elite to the X Elite (X1E84100). Again some areas of improvement but overall geomean of 1.5-2% improvement.

View attachment 32252

Lastly we can compare the A18 to the 8 Elite and X Elite. We can see an 11% difference in favour of the A18 vs the 8 Elite if we include Object Detection and 9% without it.
On the left with Object Detection and without it on the right.
A18 vs X Elite with Object Detection on the left and without it on the right. A18 leads by 13% with OD and by 11% without it.


So overall, it does seem like there has been a very big uplift for the 8 Elite vs the 8 Gen 3. Perhaps less so vs the X Elite, which makes me question how much changed between the two, although areas like Photo Filter have improved quite a bit.

Versus the A18, it seems like a smaller improvement to me at least. Perhaps 2% better than the X Elite.

Obviously nothing in this data tells us much about efficiency, which is clearly a big area of improvement for the 8 Elite.

Edit. Renamed iPhone17 to correct name and added info that top and bottom 5% were removed from results.

Very nice work! Do you happen to have downloaded a set of A14 scores, would be interesting to compare 8 Elite IPC to Apple's Firestorm.
 
Edit. I also find Qualcomm testing the 8 Elite vs the iPhone 16 using geekbench 6.2 a bit shady for this reason. 6.3 came out months before the iPhone release. Why should it be tested with an older version?
6.2 doesn't use SME. It was 6.3 that added SME support.

They chose 6.2 to make it a 'fair comparison', because 8 Elite doesn't have SME.

PS: My first post in this forum ig. Yes, I am Mahua from the other place, and I have many other names besides, in various other forums I visit.
 
6.2 doesn't use SME. It was 6.3 that added SME support.

They chose 6.2 to make it a 'fair comparison', because 8 Elite doesn't have SME.

PS: My first post in this forum ig. Yes, I am Mahua from the other place, and I have many other names besides, in various other forums I visit.
Welcome to This Place
 
PS: My first post in this forum ig. Yes, I am Mahua from the other place, and I have many other names besides, in various other forums I visit.

Welcome!

6.2 doesn't use SME. It was 6.3 that added SME support.

They chose 6.2 to make it a 'fair comparison', because 8 Elite doesn't have SME.

I know.

Why is it fair? As @NotEntirelyConfused said above perhaps Qualcomm should put SME in their cpu.
I have to admit that I'm in multiple minds about this. On one hand, SME is brand new so 3rd party software taking advantage of it is likely to be relatively small. On the other hand, the AMX units have been in Apple devices for awhile and anyone using the Accelerate framework has been reaping the advantages and I would surmise that there's not a trivial percentage of software using it, especially of course Apple's 1st party software, which if you are using an Apple device matters a lot! After all, even the specific benchmark, object detection, is something Apple does quite a bit in its Photos software which Apple users definitely use on a daily basis even if they don't realize it. And finally you have different philosophies of benchmarking, should the benchmark reflect the current state of 3rd party software? or what the chip is capable of once such software is adapted to it? - which most of the recent and up-to-date software eventually will be.

Therefore, on one hand you can argue that Primate Labs jumped the gun in supporting SME basically as soon as the first (and as far as I know only) consumer chip supported it. On the other hand, you can argue that not supporting it would have undercounted the M4's performance and indeed even despite its inclusion 6.3 continues to undercount the M1-M3's performance on software that is not just native, but optimized for the Apple ecosystem.

So I lean towards including 6.3 and Object Detection in the results ... buuuut I'm a stickler for including subtest scores whenever possible since I'm not a fan of average scores over subtests anyway except as a very high level, quick and dirty comparison point. For both SPEC and GB, my philosophy is that if you're getting down into the weeds of performance modeling, it should be by individual subtest whenever possible. The average just isn't that intrinsically meaningful and subtests should be viewed individually and discussed in their individual contexts for both software and hardware.

EDIT: Oh and because of the high degree of run-to-run variance I also prefer the violin plots that @leman started and you and I do as well when possible. I just wish we had a better understanding of what the clockspeed measurements actually were in the GB database. For the record, I never got a response from Primate Labs on my query about this.
 
Last edited:
Welcome!




I have to admit that I'm in multiple minds about this. On one hand, SME is brand new so 3rd party software taking advantage of it is likely to be relatively small. On the other hand, the AMX units have been in Apple devices for awhile and anyone using the Accelerate framework has been reaping the advantages and I would surmise that there's not a trivial percentage of software using it, especially of course Apple's 1st party software, which if you are using an Apple device matters a lot! After all, even the specific benchmark, object detection, is something Apple does quite a bit in its Photos software which Apple users definitely use on a daily basis even if they don't realize it. And finally you have different philosophies of benchmarking, should the benchmark reflect the current state of 3rd party software? or what the chip is capable of once such software is adapted to it? - which most of the recent and up-to-date software eventually will be.

Therefore, on one hand you can argue that Primate Labs jumped the gun in supporting SME basically as soon as the first (and as far as I know only) consumer chip supported it. On the other hand, you can argue that not supporting it would have undercounted the M4's performance and indeed even despite its inclusion 6.3 continues to undercount the M1-M3's performance on software that is not just native, but optimized for the Apple ecosystem.

So I lean towards including 6.3 and Object Detection in the results ... buuuut I'm a stickler for including subtest scores whenever possible since I'm not a fan of average scores over subtests anyway except as a very high level, quick and dirty comparison point. For both SPEC and GB, my philosophy is that if you're getting down into the weeds of performance modeling, it should be by individual subtest whenever possible. The average just isn't that intrinsically meaningful and subtests should be viewed individually and discussed in their individual contexts for both software and hardware.

EDIT: Oh and because of the high degree of run-to-run variance I also prefer the violin plots that @leman started and you and I do as well when possible. I just wish we had a better understanding of what the clockspeed measurements actually were in the GB database. For the record, I never got a response from Primate Labs on my query about this.
Yeah. I’m not opposed to removing Object Detection when we’re trying to understand general ipc improvements. When we’re talking about overall system performance however, it seems fair to include it. Just as it’s fair to take into account media accelerators etc when discussing an overall system.

Disappointing to see no response from Primate Labs.
 
Yeah. I’m not opposed to removing Object Detection when we’re trying to understand general ipc improvements. When we’re talking about overall system performance however, it seems fair to include it. Just as it’s fair to take into account media accelerators etc when discussing an overall system.

Disappointing to see no response from Primate Labs.
I'd even say it's fair game to leave it in when discussing IPC improvements as long as one makes it clear that this is part of the results and to interpret them with that in mind. Removing or using 6.2 is okay as long as it isn't being used to make a ... statement shall we say as we saw when the M4 first came out from various corners. I'd still rather include it than not as to me the accelerator's presence is only slightly different from AVX/SVE/NEON (especially AVX-512!).
 
Very nice work! Do you happen to have downloaded a set of A14 scores, would be interesting to compare 8 Elite IPC to Apple's Firestorm.
I just downloaded an equal number of iPhone 12/Pro/Max (A14) scores. 121 for each. I will say for such a small number, results can vary with only a few more scores.

The 8 Elite has about a 4% higher IPC than the A14 according to these results, as calculated by the geomean of individual tests, which I can provide if anyone is interested.

Also just noticed the worst performing sub-test here is Object Detection which received the biggest improvement with the A18. Lol. I think @dada_dave mentioned something about this before.

1730140056363.png
 
Last edited:
TBDR helps but I doubt it can overcome a potential 20% FP32 throughput deficit (maybe match it depending) - assuming 12 cores, 128 FP32 units per core, 1 GHz vs 6 cores (Dimensity/Adreno), 128 FP32 units per core, 1.6 GHz (A18 Pro) -
FYI, the GPU in the Dimensity 9400 clocks at 1.6 GHz- 1612 MHz to be exact.

Adreno clocks lower, compared to Mali/Immortalis and Apple GPU.

And from where did you get the 1.6 GHz figure for the GPU clock speed of A18 Pro? I have seen 1.5 GHz for it.
Screenshot_20241009_230028_YouTube.jpg

Screenshot_20241009_232748_YouTube.jpg

The improvement in the efficiency curve is certainly spectacular. Despite a ~25% clock bump compared to the Dimensity 9300 (1300 MHz), the power consumption has hardly increased at all.

Sources; Geekerwan's review of the Dimensity 9400.

Engineering device


Retail device [Oppo Find X8 Pro]


With Apple and Qualcomm taking the top 2 places, I would say Mediatek is the 3rd most innovative mobile SoC vendor. They have done some interesting things in the past few years, such as the decision to drop the Cortex A5xx cores and replace them with Cortex A7xx as the efficiency cores of the Dimensity 9300. A design choice that has been very successful;

Dr. Ian Cutress' video about Dimensity 9300/9400.

Screen4461_YouTube.jpg

The Dimensity 9300 uses 12T transistors for the L1 cache, whereas the industry norm is to use 6T transistors for the L1 cache.
 
FYI, the GPU in the Dimensity 9400 clocks at 1.6 GHz- 1612 MHz to be exact.
Yes I've seen that, but there's a couple of problems I have with it. In practice I wonder how often the Dimensity actually hits those clock speeds in the GPU since, unless the shader core per core count wrong that would make the GPU more than double the iPhone's TFLOPs and just over 60% the Adreno GPU's TFLOPs. But the performance and power characteristics from benchmarks are no where near that in the tested phones and frankly I don't see how they could be, especially in normal phones without active cooling. Vapor chambers only get you so far. It's on the same lithography node as Apple/Qualcomm with more cores than Apple and the same as Qualcomm, and most importantly at higher clock speeds than either. That last part is non-linear. How could they cool it if it's actually running at that clock? So for performance + power/cooling I don't see how they sustain 1.6+ GHZ for any length of time in a phone. Unless they don't have 128 shader cores per core, (it is 12 cores right?), but I was pretty sure that was what they used.

Adreno clocks lower, compared to Mali/Immortalis and Apple GPU.

And from where did you get the 1.6 GHz figure for the GPU clock speed of A18 Pro? I have seen 1.5 GHz for it.
I was rounding up from 1.55, although from your other post you have lower clock speeds than I've seen referenced for the A17/M3.
View attachment 32305
View attachment 32306
The improvement in the efficiency curve is certainly spectacular. Despite a ~25% clock bump compared to the Dimensity 9300 (1300 MHz), the power consumption has hardly increased at all.

Sources; Geekerwan's review of the Dimensity 9400.

Engineering device


Retail device [Oppo Find X8 Pro]


With Apple and Qualcomm taking the top 2 places, I would say Mediatek is the 3rd most innovative mobile SoC vendor. They have done some interesting things in the past few years, such as the decision to drop the Cortex A5xx cores and replace them with Cortex A7xx as the efficiency cores of the Dimensity 9300. A design choice that has been very successful;

Dr. Ian Cutress' video about Dimensity 9300/9400.

View attachment 32309
The Dimensity 9300 uses 12T transistors for the L1 cache, whereas the industry norm is to use 6T transistors for the L1 cache.

Yeah I saw that video, it was cool. Very much looking forwards to the rumored Nvidia/Mediatek collaboration. If I ever get healthy again (not talking about the head cold), I'd love an Nvidia-Mediatek SOC to continue my project. Access to truly unified memory on a SOC that doesn't cost $30,000 + CUDA? YAY! (oh and +ARM) I only hope that they have a small desktop option so I can also use a dGPU as well to develop with both.
 
I told you guys they’d come out with something decent. There was no way at those power figures and frequencies. The turnaround is very impressive
IMG_2021.jpeg

IMG_2020.jpeg



N3E is only worth so much, so this also confirms my suspicions they had more in the tank.
 
Qualcomm is still behind Apple, but this just shows that at least at current trajectory the team is the real deal. Maybe it ends up like the old custom cores, but look, like I said, this is cheaper than buying Arm Cortes and better significantly right now.

And both Arm and QC are generations ahead of AMD and Intel.
 
This also shows whatever they ship in laptops come 2026 on N3P or N2 will have at minimum a GB6 of 3200+ and minimum frequency will probably be even higher than it currently is, but I expect IPC to be a bigger boost.

For context I seriously doubt AMD and Intel will have any laptop chip in 2026 doing around even 3200-3500 at 7-8W (at the high end that would be an extra 10% performance at the same power for the next laptop chip which is really too low with more arch improvements and N3P combined I think).
 
This also shows whatever they ship in laptops come 2026 on N3P or N2 will have at minimum a GB6 of 3200+ and minimum frequency will probably be even higher than it currently is, but I expect IPC to be a bigger boost.

For context I seriously doubt AMD and Intel will have any laptop chip in 2026 doing around even 3200-3500 at 7-8W (at the high end that would be an extra 10% performance at the same power for the next laptop chip which is really too low with more arch improvements and N3P combined I think).
You don’t think the next generation Snapdragon laptops will come out until 2026?! Or are you talking 3rd generation? They have the cores done, surely it’ll be 2025 right?
 
Last edited:
I had to gloat, but :).

FWIW, literally have an iPhone 16 Pro. Apple is still ahead, and the A18 and A18 Pro are still more impressive engineering, but it’s closer than ever before.


The main reason again I am excited is I don’t think Arm can get it done with their designs to the extent Apple can and expect that momentum will continue on Qualcomm’s end for Windows and for phones.

They have area efficiency (Apple does too actually), scalability, performance per watt and in the next generation I think they’ll get a lot closer on IPC, and with their clocking and efficiency much like Apple, I expect that means huge ST for Windows laptops without blowing power or area up as much as AMD and Intel, and next time they’ll have real E Cores.

The X Elite goes toe-to-toe with Lunar Lake on multiple fronts despite multiple disadvantages and a rushed product, and lower cost. AMD and Intel have nothing that can slot into $700-900 laptops with great ST and great battery life, but QC does, and it’s all about architecture.

Think about the next one lol.



People are underrating WoA vs AMD and Intel still for those reasons I think
 
You don’t think the next generation Snapdragon laptops will come out until 2026?! Or are you talking 3rd generation? They have the cores done, surely it’ll be 2025 right?
Well, conflicting rumors. You may be right now though.



Some report it’s still 2026 H1 and with Oryon V2, but that IP doens’t make sense. Oryon V2 is already shipping today (well very soon) at mass production. If it launched at the same time phones did or after, it would be a generation behind.

My guess is that it’s either early 2025 or summer 2025 and V2, or it’s V3 and late 2025/early 2026.
 
So now the 8 Elite is out, would it fair to summarise that it’s has improved on efficiency and not so much on top end performance?

When the X1E84100 was previewed last year, QC advertised a Geekbench single-core score of ~3200 at a clock speed of 4.3.

View attachment 32205

When released in June this year, the Max frequency advertised had dropped to 4.2 and the highest GB scores seen have been ~2900-3000.

View attachment 32206


Now the 8 Elite is coming and again the max freq is 4.3 and GB scores around 3100-3200 (shenanigans not withstanding).


I see at the other place that someone has posted a slide from the recent event showing significant efficiency improvements, which can't be accounted for by the smaller node. Yet there are similarities at the high end.
View attachment 32207


Is that a fair summary?
Yes. The high end was because of Linux on the old chip with thermally free running, and now it’s Android (Linux) though with more constraints and scheduling stuff.

The node can’t account that difference. It probably explains why they were able to clock the chips higher and yield it as I said earlier, but the power itself is clearly architecture and very big — like N3E probably worth +5-10% performance iso-power or -15-20% power iso-performance over N4P maybe? But they got much more than that, which is what I expected because 4.3GHz in a phone is very high, too high to be QC BSing about power with the same platform and only small node gains. And sure enough.

Though even I am surprised, I didn’t expect that much of a gain on the X925 in general integer stuff, but it’s there in terms of perf/W.


Also as we’ve discussed QC was shady about the M2 Max comparison because they used platform power where the max has a bus 4x the size and more overhead (so it does consume like 15-20W ST) and also they used Linux which gives chips like a 3-10% boost due to scheduling stuff.
 
Well, conflicting rumors. You may be right now though.



Some report it’s still 2026 H1 and with Oryon V2, but that IP doens’t make sense. Oryon V2 is already shipping today (well very soon) at mass production. If it launched at the same time phones did or after, it would be a generation behind.

My guess is that it’s either early 2025 or summer 2025 and V2, or it’s V3 and late 2025/early 2026.
GWIII also expressly hinted that they have another IPC upgrade coming which they really didn’t get with this to a big extent fwiw. I could be wrong though and it’s this:

V3 has the big upgrade for phones next fall

But at the same time Oryon V2 ships in laptops in the next chips next fall.

I could see it. It might explain why 4.8-5GHz is rumored. it’d still be great, just not ideal. A even a 10-20% IPC upgrade while keeping frequency constant for N3P and not increasing power too much would be killer for Windows laptops.
 
Back
Top