Nuvia: don’t hold your breath

dada_dave · Oct 28, 2023

Andropov said:
Their claim is "Matches competitor [M2] peak performance at 30% less power" displayed next to a Single Core score comparison. If they are misleadingly hand picking figures from different scenarios and they meant something like "Matches M2 peak performance while using 30% less power during regular use" they may as well say "15% faster than M2 peak performance while using 30% less power", as it looks like it can beat the M2 in single core.

At CineBench 23 the M2 Max was measured to peak at 34-36W [source]. Let's say 40W. And let's use that to get a very conservative upper bound for the M2 core peak power doing: 40W / 8P cores -> 5W/core (it's less, because at least some of those 40W go to the E cores).
Now for the Snapdragon X Elite. If its cores can match the M2 core's performance at 30% less power, that's at most 5W * 0.7 -> 3.5W/core, while matching the scores of a M2 core. The Snapdragon X Elite has 12 cores, so 3.5W * 12 -> 42W total multicore CPU consumption (upper bound) with all cores matching the score of a M2 core.

Now let's try to match the above with their other claim "50% faster peak multi-threaded performance vs. M2". The M2 has 4P+4E cores, and we know the E cores reach up to ~25% of the peak performance of the P cores, so the whole CPU has the "combined performance" of 5P cores. However, the Snapdragon X Elite has 12 cores, and each of them should be able to match the performance of a M2 core using 30% less power. Therefore, using just 42W, it should beat the M2 by (12 Snapdragon X Elite cores @ M2 perf level - 5 M2 P cores) / (5 M2 P cores) = 140%. How is it just 50% faster? This is so far off I had to re-check the slides to ensure they weren't talking about M2 Pro/Max.

And it's actually even worse, because the 140% faster projected figure is assuming the CPU can't use more then 3.5W/core (enough to match the M2 core performance). But if, as we know from other figures, the Snapdragon X Elite can reach up to 80W for the CPU alone, and if the "matches M2 peak performance at 30% less power" claim is true, it means that each core should be faster than a M2 core (either that of the performance/power curve goes downwards ). And even if it's only a tiny bit faster than the M2 per core at this power, it should destroy the M2 in benchmarks even further, because it has 12 of them.

Strong agree, matches exactly with my own back-of-the-envelope calculations that I alluded to here:

dada_dave said:
Given what they quoted for single core results, a 12 core Oryon CPU at 50 watts should be closer to 50% faster than a an M2 Max than 50% faster than an M2. That’s a massive discrepancy.

Andropov said:
None of this makes any sense to me. I also find the claim that they can use 80% less GPU power for the same performance compared to Ryzen 9 7940HS dubious, I didn't think AMD's GPUs were particularly power hungry, but I don't know much about non-Apple hardware.

We’ve had similar issues with Adreno results in mobile. It seems to get fantastic results in graphics benchmarks with high scores and low power and positively abysmal results in compute given those scores. @Jimmyjames reported the latest results here:

Jimmyjames said:
Geekbench Compute for the Adreno 750 (8 Gen 3 Gpu)

12017. Equal to the…A13. Hmmmm

Xiaomi 23116PN5BC - Geekbench

Benchmark results for a Xiaomi 23116PN5BC with an ARM ARMv8 processor.

browser.geekbench.com

Seems under Vulcan it matches an A14. Good job.

Xiaomi 23116PN5BC - Geekbench

Benchmark results for a Xiaomi 23116PN5BC with an ARM ARMv8 processor.

browser.geekbench.com

Now some of this may be due to the fact that comparing across Geekbench compute APIs is not great and OpenCL is a zombie API on many platforms, but @leman had a theory, backed up by what little he could find about the Adreno GPU that it’s possible that the Adreno GPU effectively fakes its FP32 results in graphics applications and actually runs them as FP16, which doesn’t work in compute applications. I can’t find his post, he might be able to link. The only caveat is that their official FP32 TFLOPs as measured by I can’t remember which program is higher than Apple’s mobile GPU for lower power. If that code is actually enforcing FP32, which it may not be, that wouldn’t be the case under @leman ’s hypothesis. That said, his hypothesis is the only one that makes sense so far and is backed up by the rather poor compute performance. There was another user that likewise suggested Adreno GPUs were somehow specialized for gaming at the expense of compute.

Basically more data/information is needed here. People complain about how little Apple shares but (some due to the time people have spent reverse engineering their stuff) a lot more seems to be known about Apple hardware than Qualcomm’s.

Andropov said:
But why would they? Other than the power management unit stuff, I can’t see a reason why they would be forced to clock their cores lower, other than power or heat, and neither should be an issue if its cores are in fact similar to Firestorm.

The final possibility other than being completely misleading is the one that you and I both mooted is that something goes very wrong in feeding the cores in multicore workloads.

Aaronage said:
Geekbench 6 - Geekbench Blog

www.geekbench.com

"The multi-core benchmark tests in Geekbench 6 have also undergone a significant overhaul. Rather than assigning separate tasks to each core, the tests now measure how cores cooperate to complete a shared task. This approach improves the relevance of the multi-core tests and is better suited to measuring heterogeneous core performance. This approach follows the growing trend of incorporating “performance” and “efficient” cores in desktops and laptops (not just smartphones and tablets)."

Basically, GB5 used to run a separate task on each core, GB6 runs a single multi threaded task across all cores. The former scaled almost perfectly but wasn't representative of typical multi threaded workloads.

Thanks! I could see how that might impact things for many core CPUs - similar to how in a GPU benchmark if you don’t supply a big enough workload some the performance scaling of larger GPUs seems to drop off. But I would hope that 12 single threaded CPU cores wouldn’t be enough to hit that limit in GB6 regardless of how fast they are. Certainly not enough to cause the massive disconnect between single core and multicore values we’re seeing taking all the claims at face value.

Which as @Andropov, @leman, and @Cmaier have all suggested may not be a wise thing to do even beyond standard marketing shenanigans. I’m hoping they confined themselves to standard marketing shenanigans like cherry picking and not something worse.

Cmaier · Oct 28, 2023

Andropov said:
But why would they? Other than the power management unit stuff, I can’t see a reason why they would be forced to clock their cores lower, other than power or heat, and neither should be an issue if its cores are in fact similar to Firestorm.

Heat could be quite different even with identical cores, because they could be putting cores too close together, they could have an inferior thermal solution in the packaging, etc.

Andropov · Oct 28, 2023

leman said:
While we shouldn't expect perfect scaling (since they are using GB6 which penalises many-core architectures), I'd expect the 12-core Orion to perform closer to the 12-core AMD's 7900X (which is closer to being 70-80% faster than M2). Also note how Qualcomm is avoiding mentioning any concrete numbers for the multi-core scores, even though they were more than eager to provide a very specific number for their single core benchmark. Overall, I think it means that they are running the cores at lower frequencies that they would like to admit.

I wasn't aware of the changes in Geekbench 6. Though as they don't mention which software they used for the comparison,

dada_dave said:
We’ve had similar issues with Adreno results in mobile. It seems to get fantastic results in graphics benchmarks with high scores and low power and positively abysmal results in compute given those scores. @Jimmyjames reported the latest results here:

I think with GPUs there are more things that can explain a difference in performance, specially discussing rasterization vs compute.

Cmaier said:
Heat could be quite different even with identical cores, because they could be putting cores too close together, they could have an inferior thermal solution in the packaging, etc.

But if they had issues with heat like having the cores too close together, one would expect the CPU to have a low peak power consumption to prevent it, right? Instead it seems like they can get the CPU alone to use more than 70W.

There's one way of twist the numbers I could think of after seeing many of their slides (hopefully wrong). One could simply compare the minimal power figure available for their SoC (core power draw or cluster power draw) versus a different power figure for the competition (full package power, or even wall power).
Using this methodology, it'd be quite easy to beat the competition's power consumption in single core, as you could include a bunch of extra things (like RAM power) as part of the competition's power figures, while for multicore this measuring method would be way less advantageous, as power contributions not coming from the CPU would be dwarfed. Think of this completely made up example:

Same thing for GPU power. Isn't it a bit surprising that their GPU uses 80% less power than a Ryzen 9 7940HS GPU, which is also on 4nm?

Also, something I read in the comments of the AnandTech article: if their own numbers are true, the reportedly 50W Snapdragon X Elite would lose against the (35W) M2 Pro in both peak multicore performance and power consumption, despite their claim of having a core design that is both more powerful and more efficient.

FWIW, I believe the 3,227 Geekbench 6 number will prove real. Everything else, I'm going to cautiously wait until it gets tested in the wild.

Jimmyjames · Oct 28, 2023

Over at the other place, someone posted that there are three reference platforms: 12 W, 23 W and 45 W. I’m not entirely sure what is meant by this. Is it the same SoC running at three different power levels, or does each of the reference platforms have slightly a differnet SoC? I have not found any evidence of these three platforms anywhere else, so treat it as suspicious.

If true, I wonder if this could explain the weird results?

Edit: It’s mentioned here, but it does seem like it’s just one SoC.

Qualcomm's Snapdragon X Elite chips promise major PC performance

Qualcomm is bringing the fruits of its Nuvia purchase to the PC with the Snapdragon X Elite, a series of Arm PC processors that Qualcomm claims will double the performance of Intel's recent 13th-gen Core chips.

www.pcworld.com

Thought it might be useful to browse Anandtech’s forums given they have had great technical reviewers on that site. Boy do the forum members not live up to that. I was suprised to find out that the M3 is a failure, the A17 is a failure also. The idea of percentage increases outside the context of the actual performance increases has damaged people’s perceptions. People are obsessed with “only 10% performance inscrease”. Without considering the A17 is actually a pretty big improvement. Only a quick peruse so I may be judging them unfairly. This forum has by far the most knowledgeable discussions and members that I am aware of.

Souko · Oct 28, 2023

Here https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-x-elite is written Clock Speed: 12 cores up to 3.8 GHz. So maybe 3.8GHz power unconstrained? But frequency will probably vary.
And one other catch: in one of their slides they say, that compared to 13800H it uses 65% less power for same performance and that is like 33W. But M1 Pro base at full power uses about 21W and has about the same performance as 13800H in Razer Blade 15 in Geekbench 6.2. So it looks worse than M1 even when running at 35W or when the frequency of cores would be lower. About 1/1.6 of Elite X CPU max performance (about 2.4GHz assuming that it runs all cores at same frequency and that performance is linear function of frequency…
And about 17W for performance below base M1 Pro at 12W (low power mode).

Yoused · Oct 28, 2023

Honestly, I would like to see Nuvia, et al flourish somewhat. I dislike MS Widows a lot, but nowhere near as much as I detest x86. I want to see x86 become more marginalized than, say, Amiga. They can make all the BS claims they want, but the proof will be in the custard. When 90% of non-Mac personal computers are running on ARM, people will be able to see, straight up, that Macs offer a better experience, and no slideshow in the world will be able to change that.

dada_dave · Oct 28, 2023

Souko said:
Here https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-x-elite is written Clock Speed: 12 cores up to 3.8 GHz. So maybe 3.8GHz power unconstrained? But frequency will probably vary.
And one other catch: in one of their slides they say, that compared to 13800H it uses 65% less power for same performance and that is like 33W. But M1 Pro base at full power uses about 21W and has about the same performance as 13800H in Razer Blade 15 in Geekbench 6.2. So it looks worse than M1 even when running at 35W or when the frequency of cores would be lower. About 1/1.6 of Elite X CPU max performance (about 2.4GHz assuming that it runs all cores at same frequency and that performance is linear function of frequency…
And about 17W for performance below base M1 Pro at 12W (low power mode).

The M1 Pro base is >20% slower than 13800H. The full M1 Pro is roughly 10% slower. The full M2 Pro is faster for slightly more power than what they seem to claim.

Of course they can go faster for even more power but as we’ve already said it’s nowhere near what their performance should be compared to their single core performance claims.

leman · Oct 29, 2023

dada_dave said:
Now some of this may be due to the fact that comparing across Geekbench compute APIs is not great and OpenCL is a zombie API on many platforms, but @leman had a theory, backed up by what little he could find about the Adreno GPU that it’s possible that the Adreno GPU effectively fakes its FP32 results in graphics applications and actually runs them as FP16, which doesn’t work in compute applications. I can’t find his post, he might be able to link.

Qualcomm optimisation manuals recommend using a "native" version of operations for best performance, and they explicitly state that these "native" operations are suitable for graphics and other tasks where numerical precision is less important. They also explicitly state that Adreno can execute FP16 operations at higher rate than FP32 ones. I also found at least one mention that Adreno dos FP32 math at 24-bit precision in the graphics pipeline.

The thing is, all these are very valid optimisation techniques if mobile graphics is your focus. And lower ALU precision is not the only possible optimisation. You can ship smaller register files, lower precision texture filters, slower advanced function etc. and your users won't notice any of this because the shader complexity of mobile games is fairly low (no idea whether Qualcomm uses any of these optimisations). So if that's your goal, you can build a fairly fast GPU that's also small and power efficient. But this GPU will suck at general-purpose computing or complex applications. Which is exactly what we see in case of Qualcomm.

Souko · Oct 29, 2023

dada_dave said:
The M1 Pro base is >20% slower than 13800H. The full M1 Pro is roughly 10% slower. The full M2 Pro is faster for slightly more power than what they seem to claim.

View attachment 26992
View attachment 26990
View attachment 26991

Of course they can go faster for even more power but as we’ve already said it’s nowhere near what their performance should be compared to their single core performance claims.

When I consider device used it is about the same https://browser.geekbench.com/search?k=v6_cpu&q=I7+13800h+razer+blade&utf8=✓

Yes, I agree about that their claims does not make sense. I just tried to use their multi-core claims at different power and see how is effeciency compared to M1. Not that it would explain that their single core and multi core scores claims does not make sense.

Just my speculation is that they have worse cores than Firestorm by effeciency. I tried to ignore their single core claim because it says something very different compared to everything else. I tried to see their multi core power curve. And it looks like they are less effecient than M1 at every power level. I do not know if it can move our theories further. Someone who knows more than me can maybe say if it looks like they can have problems with packaging, power controller,… I just want to add another data from their slides to help with theories.

dada_dave · Oct 29, 2023

Souko said:
When I consider device used it is about the same https://browser.geekbench.com/search?k=v6_cpu&q=I7+13800h+razer+blade&utf8=✓

Yes, I agree about that their claims does not make sense. I just tried to use their multi-core claims at different power and see how is effeciency compared to M1. Not that it would explain that their single core and multi core scores claims does not make sense.

Just my speculation is that they have worse cores than Firestorm by effeciency. I tried to ignore their single core claim because it says something very different compared to everything else. I tried to see their multi core power curve. And it looks like they are less effecient than M1 at every power level. I do not know if it can move our theories further. Someone who knows more than me can maybe say if it looks like they can have problems with packaging, power controller,… I just want to add another data from their slides to help with theories.

Absolutely. And you’re right that’s the reference laptop they tested against. But they also claim it was tested it in with “no thermal limitations” which clearly it normally operates under in that laptop. So hard to know which of the two Intel scores, the ideal or the practical, actually reflects their comparative performance. If it’s the latter then you’re right that their multicore claims are even worse!

dada_dave · Oct 29, 2023

leman said:
Qualcomm optimisation manuals recommend using a "native" version of operations for best performance, and they explicitly state that these "native" operations are suitable for graphics and other tasks where numerical precision is less important. They also explicitly state that Adreno can execute FP16 operations at higher rate than FP32 ones. I also found at least one mention that Adreno dos FP32 math at 24-bit precision in the graphics pipeline.

The thing is, all these are very valid optimisation techniques if mobile graphics is your focus. And lower ALU precision is not the only possible optimisation. You can ship smaller register files, lower precision texture filters, slower advanced function etc. and your users won't notice any of this because the shader complexity of mobile games is fairly low (no idea whether Qualcomm uses any of these optimisations). So if that's your goal, you can build a fairly fast GPU that's also small and power efficient. But this GPU will suck at general-purpose computing or complex applications. Which is exactly what we see in case of Qualcomm.

I wonder if that’s what they did for the SD X Elite as well then? If so, they’re going to get hammered in non gaming benchmarks as reviewers will test for that in a laptop. I know Qualcomm is going to allow dGPUs but not all laptops will have them.

dada_dave · Oct 29, 2023

Jimmyjames said:
Over at the other place, someone posted that there are three reference platforms: 12 W, 23 W and 45 W. I’m not entirely sure what is meant by this. Is it the same SoC running at three different power levels, or does each of the reference platforms have slightly a differnet SoC? I have not found any evidence of these three platforms anywhere else, so treat it as suspicious.

If true, I wonder if this could explain the weird results?

Edit: It’s mentioned here, but it does seem like it’s just one SoC.

Qualcomm's Snapdragon X Elite chips promise major PC performance

Qualcomm is bringing the fruits of its Nuvia purchase to the PC with the Snapdragon X Elite, a series of Arm PC processors that Qualcomm claims will double the performance of Intel's recent 13th-gen Core chips.

www.pcworld.com

Thought it might be useful to browse Anandtech’s forums given they have had great technical reviewers on that site. Boy do the forum members not live up to that. I was suprised to find out that the M3 is a failure, the A17 is a failure also. The idea of percentage increases outside the context of the actual performance increases has damaged people’s perceptions. People are obsessed with “only 10% performance inscrease”. Without considering the A17 is actually a pretty big improvement. Only a quick peruse so I may be judging them unfairly. This forum has by far the most knowledgeable discussions and members that I am aware of.

It’s been awhile since I visited but sadly no I think you’ve got the right sense of the Anandtech forums. Occasionally you’ll get really good posters that will be very knowledgeable or at least reasonable (probably a higher percentage than on most tech forums), but there are a lot of zealots and trolls.

leman · Oct 29, 2023

dada_dave said:
I wonder if that’s what they did for the SD X Elite as well then? If so, they’re going to get hammered in non gaming benchmarks as reviewers will test for that in a laptop. I know Qualcomm is going to allow dGPUs but not all laptops will have them.

Hard to say since they are very vague about the specs. But 4.6TFLOPs is rather weak for that chip. And the new Adreno in 8 gen 3 barely matches A14 in compute (and A14 had throttled FP32…)

Aaronage · Oct 29, 2023

dada_dave said:
It’s been awhile since I visited but sadly no I think you’ve got the right sense of the Anandtech forums. Occasionally you’ll get really good posters that will be very knowledgeable or at least reasonable (probably a higher percentage than on most tech forums), but there are a lot of zealots and trolls.

I remember giving the AnandTech forum a try a few years before Apple Silicon was announced. I was hoping to find some good discussion about the possibility Apple would transition the Mac to its own silicon. Seemed like a safe bet - I thought “AnandTech attracts a lot of industry insiders, and the audience is typically more interested in the tech than fanboyism, should be good”. It was soooo disappointing to find that any mention of “Apple silicon” was met with a flood of “stupid fruit brand”, “Apple doesn’t have the experience, they can’t match Intel!”, “mobile is one thing, but there’s no way they can manage a PC SoC. It’s soooooo hard!” etc. Just total head-in-the-sand nonsense totally detached from reality.

I’ve learned that basically everyone is inclined to behave like an internet fanboy/troll on some level. I had a brief interaction with a hardware exec at work who reacted to the mention of a competitor with a 14-year-old-fanboy-on-YouTube-style diss, like “pls dont mention us and them together, they are trash” etc.

exoticspice1 · Oct 29, 2023

leman said:
Hard to say since they are very vague about the specs. But 4.6TFLOPs is rather weak for that chip. And the new Adreno in 8 gen 3 barely matches A14 in compute (and A14 had throttled FP32…)

Where do we check compute performance of 8 gen 3 GPU?

dada_dave · Oct 29, 2023

exoticspice1 said:
Where do we check compute performance of 8 gen 3 GPU?

Jimmyjames said:
Geekbench Compute for the Adreno 750 (8 Gen 3 Gpu)

12017. Equal to the…A13. Hmmmm

Xiaomi 23116PN5BC - Geekbench

Benchmark results for a Xiaomi 23116PN5BC with an ARM ARMv8 processor.

browser.geekbench.com

Seems under Vulcan it matches an A14. Good job.

Xiaomi 23116PN5BC - Geekbench

Benchmark results for a Xiaomi 23116PN5BC with an ARM ARMv8 processor.

browser.geekbench.com

exoticspice1 · Oct 29, 2023

You can't compare vulkan and metal scores. It's does seem both the A17 Pro and 8 Gen 3 are around RX 560 is terms of compute with the A17 being a bit better.

In gaming however 8 Gen 3 has a 40% leg up

leman · Oct 29, 2023

exoticspice1 said:
You can't compare vulkan and metal scores.

Why not? They solve the same problem. I bet the shader code is almost identical too (wouldn’t be surprised if they write everything in GLSL/HLSL and cross-compile to Metal)

exoticspice1 said:
It's does seem both the A17 Pro and 8 Gen 3 are around RX 560 is terms of compute with the A17 being a bit better.

If by a bit better you mean two times better, yes.

exoticspice1 said:
In gaming however 8 Gen 3 has a 40% leg up

Which is not at all surprising if Qualcomm sacrifices desktop-class features for more performance in mobile games.

Jimmyjames · Oct 29, 2023

exoticspice1 said:
You can't compare vulkan and metal scores. It's does seem both the A17 Pro and 8 Gen 3 are around RX 560 is terms of compute with the A17 being a bit better.

The maker of the benchmark claim you can compare results. Why wouldn’t we be able to?

exoticspice1 said:
In gaming however 8 Gen 3 has a 40% leg up

How do you know we can compare gaming results? The 8 Gen 3 uses Vulkan while the A17 uses Metal.

exoticspice1 · Oct 29, 2023

Jimmyjames said:
The maker of the benchmark claim you can compare results. Why wouldn’t we be able to?

Pretty sure thats only for CPUs.

Nuvia: don’t hold your breath

Elite Member

Site Master

Site Champ

Elite Member

Member

Attachments

up

Elite Member

Site Champ

Member

Elite Member

Elite Member

Elite Member

Site Champ

Power User

Site Champ

Elite Member

Site Champ

Site Champ

Elite Member

Site Champ