Nuvia: don’t hold your breath

Hah! My dumb guess had some merit. Probably explains why their multi-core benchmark doesn’t scale so well - instead of running all cores at a sustainable clock, they run half of them at boost.

I also suspect their single core benchmark won’t map great to the real world for similar reasons.
It would explain why the performance doesn’t scale but does it also explain why the Oryon CPU seemingly goes from more power efficient than the M2 in single core to less power efficient in multicore?
 
Last edited:
I know the slides can be fatiguing, but I just saw this one. It’s a cinebench 2024 test where the Elite X uses nearly 80 watts! What’s going on here? It seems like they allow huge amounts of power to be used.

1698266205939.png
 
I’m guessing it’s something like:
80W with unconstrained boost (all cores 3.8GHz)
50W with power limits in place (cores below 3.8GHz)

But yeah… which power limits apply to which claims?

The single thread claim was 15% better performance at 30% less power. If the chip is using 80W with all cores at 3.8GHz, the per core power must be around 5W right (allowing some watts for uncore)?

So a core at 4.3GHz must be using more than 5W, surely? M2 Max cores only use around 7W peak… how does this 30% less power claim work? 🤨🤷‍♂️

Edit: maybe making too many assumptions here, but this is the problem with ambiguous claims 🤦‍♂️
 
I’m guessing it’s something like:
80W with unconstrained boost (all cores 3.8GHz)
50W with power limits in place (cores below 3.8GHz)

But yeah… which power limits apply to which claims?

The single thread claim was 15% better performance at 30% less power. If the chip is using 80W with all cores at 3.8GHz, the per core power must be around 5W right (allowing some watts for uncore)?

So a core at 4.3GHz must be using more than 5W, surely? M2 Max cores only use around 7W peak… how does this 30% less power claim work? 🤨🤷‍♂️

Edit: maybe making too many assumptions here, but this is the problem with ambiguous claims 🤦‍♂️
No idea tbh.

The M2 gets 555 CB 2024 points in the multi core test. So this gets around 1100 points if the slide is accurate.
 
This is my confusion. Even ignoring the potential for 15% faster in single core, it goes from matching the M2 Max at significantly lower power in single core to only beating it by 3-5% in multicore with 50% more P-cores albeit no E-cores and requiring 25% more power. A priori, that seems very odd.
I can't think of any way for all the claims to be true at the same time. Either some are exaggerated or they're underpromising on the others.
 
I can't think of any way for all the claims to be true at the same time. Either some are exaggerated or they're underpromising on the others.

Only thing I can think of is they ordinarily run at low frequencies with cores off, but on the benchmarks they run full bore (either because they are cheating or because they can do that in short bursts corresponding to benchmark test length), and then they are mixing and matching numbers from both situations. Or something.
 
It is odd, isn’t it? That’s what makes me suspect that they might be measuring package power and not core power. But even then it’s weird. Let’s see if we will have more info in the coming days.
I dunno … I’m pretty sure they hired Andrei from Anandtech to do these analyses. While everyone makes mistakes I can’t believe he of all people would screw up like that and even then as you say that only gets you part of the explanation.
Only thing I can think of is they ordinarily run at low frequencies with cores off, but on the benchmarks they run full bore (either because they are cheating or because they can do that in short bursts corresponding to benchmark test length), and then they are mixing and matching numbers from both situations. Or something.
Again, given the people involved I hope they wouldn’t do that. But I got to admit that all these competing claims from different slides are are hard to square otherwise. Unless there is something going very wrong when turning on all the cores and trying to feed data to them. Very very wrong.

Sure, but I thought it was 50 watts max before, now it’s 80. It’s confusing to me at least.
As @Aaronage said I think this is just representing different power environments. I think they mentioned that they designed this one SOC to run in a variety of different systems with different thermal targets. But that still doesn’t explain some if the results in those different power envelopes. Given what they quoted for single core results, a 12 core Oryon CPU at 50 watts should be closer to 50% faster than a an M2 Max than 50% faster than an M2. That’s a massive discrepancy.
 
As @Aaronage said I think this is just representing different power environments. I think they mentioned that they designed this one SOC to run in a variety of different systems with different thermal targets. But that still doesn’t explain some if the results in those different power envelopes. Given what they quoted for single core results, a 12 core Oryon CPU at 50 watts should be closer to 50% faster than a an M2 Max than 50% faster than an M2. That’s a massive discrepancy.
Oh it’s definitely strange.

They are boasting about beating the M2 at 80 watts in Cinebench, which Im pretty sure is close to the power level of the M2 Ultra cpu. The problem is the Ultra gets 70% higher score at that power level.

Now it’s very possible that it will be in much cheaper machines than the Max or Ultra, so it might be a good value proposition. It just the inconsistency and the marketing boasts about crazy power levels that seem strange
 
Reading the fine print on the slides at the launch event, It states “CPU Peak performance is based on the geometric mean of 100 runs with 60 seconds in between runs utilising Geekbench 6.2.1…"
Also “Power consumption reflects power as measured on both instrumented devices while running at the peak CPU performance of the <competitor>"

Does “instrumented” mean powermetrics in the case of macOS? Not sure if that tells anything meaningful!

The single core comparison to the i7-13800H is weird too. The Elite X beats the i7 by 35 points (basically nothing). And they emphasise it matches the i7 at 70% less power. It seems like it would beat the i7 at pretty much 70% less power given how close the scores are. Why not just say “we beat it AND it’s using 68% less power”? Why bother with the iso power comparison? Is that a common means of comparison that I’m just unaware of. Super weird.

Ok, they do say it uses 68% less power than the i7 shortly after! Disregard.

1698285049207.png

OK…which is it?
 
Last edited:
Also “Power consumption reflects power as measured on both instrumented devices while running at the peak CPU performance of the <competitor>"

A lot of room for mischief there. How do they know they are ”running at the peak CPU performance of the competitor?” They turn down the frequency until they get the same benchmark score?
 
Reading the fine print on the slides at the launch event, It states “CPU Peak performance is based on the geometric mean of 100 runs with 60 seconds in between runs utilising Geekbench 6.2.1…"
Also “Power consumption reflects power as measured on both instrumented devices while running at the peak CPU performance of the <competitor>"

Does “instrumented” mean powermetrics in the case of macOS? Not sure if that tells anything meaningful!

The single core comparison to the i7-13800H is weird too. The Elite X beats the i7 by 35 points (basically nothing). And they emphasise it matches the i7 at 70% less power. It seems like it would beat the i7 at pretty much 70% less power given how close the scores are. Why not just say “we beat it AND it’s using 68% less power”? Why bother with the iso power comparison? Is that a common means of comparison that I’m just unaware of. Super weird.

Ok, they do say it uses 68% less power than the i7 shortly after! Disregard.

View attachment 26933
OK…which is it?
ISO power just means you’ve held the power steady between tests, but not necessarily what that power level was. For instance, let’s say in comparing multicore scores to an i7, you ran the i7 at 50 watts and the Oryon at 50 watts. That would be comparing at ISO. So would running them both at 80 watts. Depending on the power curve of each chip you’ll get very different results - t without knowing at what power draw the two chips were being held at it’s hard to know exactly what’s going on but we’ve already discussed how some of these results appear to be extreme. Anyway, typically you try to measure them at a meaningful point like the maximum power draw of one of the two chips and say at ISO power we are x% better but I suppose you can in theory measure it at multiple points though then you might as well just give the power curves.

You can of course also measure at ISO performance, which means you try to change the power draw by adjusting frequency until you match your competitor’s performance and then you measure the power of each device.

A lot of room for mischief there. How do they know they are ”running at the peak CPU performance of the competitor?” They turn down the frequency until they get the same benchmark score?

My understanding is that’s pretty much exactly what happens both for “ISO” tests and full power curves.
 
Only thing I can think of is they ordinarily run at low frequencies with cores off, but on the benchmarks they run full bore (either because they are cheating or because they can do that in short bursts corresponding to benchmark test length), and then they are mixing and matching numbers from both situations. Or something.
Their claim is "Matches competitor [M2] peak performance at 30% less power" displayed next to a Single Core score comparison. If they are misleadingly hand picking figures from different scenarios and they meant something like "Matches M2 peak performance while using 30% less power during regular use" they may as well say "15% faster than M2 peak performance while using 30% less power", as it looks like it can beat the M2 in single core.

At CineBench 23 the M2 Max was measured to peak at 34-36W [source]. Let's say 40W. And let's use that to get a very conservative upper bound for the M2 core peak power doing: 40W / 8P cores -> 5W/core (it's less, because at least some of those 40W go to the E cores).
Now for the Snapdragon X Elite. If its cores can match the M2 core's performance at 30% less power, that's at most 5W * 0.7 -> 3.5W/core, while matching the scores of a M2 core. The Snapdragon X Elite has 12 cores, so 3.5W * 12 -> 42W total multicore CPU consumption (upper bound) with all cores matching the score of a M2 core.

Now let's try to match the above with their other claim "50% faster peak multi-threaded performance vs. M2". The M2 has 4P+4E cores, and we know the E cores reach up to ~25% of the peak performance of the P cores, so the whole CPU has the "combined performance" of 5P cores. However, the Snapdragon X Elite has 12 cores, and each of them should be able to match the performance of a M2 core using 30% less power. Therefore, using just 42W, it should beat the M2 by (12 Snapdragon X Elite cores @ M2 perf level - 5 M2 P cores) / (5 M2 P cores) = 140%. How is it just 50% faster? 🤨 This is so far off I had to re-check the slides to ensure they weren't talking about M2 Pro/Max.

And it's actually even worse, because the 140% faster projected figure is assuming the CPU can't use more then 3.5W/core (enough to match the M2 core performance). But if, as we know from other figures, the Snapdragon X Elite can reach up to 80W for the CPU alone, and if the "matches M2 peak performance at 30% less power" claim is true, it means that each core should be faster than a M2 core (either that of the performance/power curve goes downwards 😂). And even if it's only a tiny bit faster than the M2 per core at this power, it should destroy the M2 in benchmarks even further, because it has 12 of them.

None of this makes any sense to me. I also find the claim that they can use 80% less GPU power for the same performance compared to Ryzen 9 7940HS dubious, I didn't think AMD's GPUs were particularly power hungry, but I don't know much about non-Apple hardware.
 
Only way I could imagine all claims being true is with a massive bandwidth limitation issue that caused abysmal performance scaling with multiple cores. But I think it's just that they're messing around with the numbers, I think some claims will prove false or at best misleading once the chip is launched.
 
@Andropov I think there are just too many factors in play, which makes it all to easy to be lost in speculations and interpretations.

Trying to look beyond the marketing BS, it seems to me that the Orion is very similar to Firestorm (at least when it comes to the IPC), and the claims of "matches the performance at 30% lower power" are consistent with improvements of N4 over N5 + maybe few % from microarchitecture. Not really surprising if you think that people who designed Oryon are the same who designed Firestorm. Anyway, for all intends and purposes Orion runs at clocks at up to 3.8Ghz (similar to M2 Max), but it can also function in a "turbo mode" where up to two cores are overclocked by 15% to 4.3Ghz (these cores must be located in different clusters and presumably the rest of the cores in the same cluster are turned off). Qualcomm has been silent about the power or opportunity cost of such operation.

Regarding the "30% lower power at same core performance", this indeed leaves a lot of questions open. While we shouldn't expect perfect scaling (since they are using GB6 which penalises many-core architectures), I'd expect the 12-core Orion to perform closer to the 12-core AMD's 7900X (which is closer to being 70-80% faster than M2). Also note how Qualcomm is avoiding mentioning any concrete numbers for the multi-core scores, even though they were more than eager to provide a very specific number for their single core benchmark. Overall, I think it means that they are running the cores at lower frequencies that they would like to admit.

There were some earlier reports that Qualcomm has had problems with power management hardware for the platforms. I don't know how credible these reports are, but the short version was that Qualcomm corporate insisted on using their own mobile power management units which are designed for much lower power brackets than required in a high-performance laptop. The report claimed that this will a) significantly increase the cost of the mainboard and b) decrease the efficiency at higher power draw. Maybe the latter is what we are observing here. Maybe Orion is inherently more power efficient than Firestorm, but due to power controller limitation the system will waste power under multicore operation. Or maybe Qualcomm is simply measuring the power consumption in the single-core scenario to make themselves look better. Who knows.
 
While we shouldn't expect perfect scaling (since they are using GB6 which penalises many-core architectures),

Could you expand on this? I know GB5 on Windows had severe problems with many core processors but I was given to understand that those were fixed.
 
Overall, I think it means that they are running the cores at lower frequencies that they would like to admit.
But why would they? Other than the power management unit stuff, I can’t see a reason why they would be forced to clock their cores lower, other than power or heat, and neither should be an issue if its cores are in fact similar to Firestorm.
 
Could you expand on this? I know GB5 on Windows had severe problems with many core processors but I was given to understand that those were fixed.

"The multi-core benchmark tests in Geekbench 6 have also undergone a significant overhaul. Rather than assigning separate tasks to each core, the tests now measure how cores cooperate to complete a shared task. This approach improves the relevance of the multi-core tests and is better suited to measuring heterogeneous core performance. This approach follows the growing trend of incorporating “performance” and “efficient” cores in desktops and laptops (not just smartphones and tablets)."

Basically, GB5 used to run a separate task on each core, GB6 runs a single multi threaded task across all cores. The former scaled almost perfectly but wasn't representative of typical multi threaded workloads.
 
Back
Top