Nuvia: don’t hold your breath

Of all the bans I've gotten over there, I have to say I'm most surprised by this one.

I guess maybe they didn't like it when I described his posts as un-earned arrogance? But I think that was completely accurate - minghold was being very condescending towards people who didn't accept his claim that Apple is doing evil things with all your private data, but he didn't bring anything real to back that up.
Oh I agree with you.
 
Lol okay the X elite reviews so far just look okay in context. I might take an L on some of this, I expected a bit more.
 
Notebookcheck reviews:



Very balanced reviews - ignore the Intel/AMD fanboys in the comments section at the end accusing NBC of faking everything or hiding the truth. However, I think the lack of efficiency cores is really hurting Qualcomm here especially for needing active cooling to get its multicore scores and the effect of the lack of efficiency cores on multicore scores is possibly compounded by the lack of ram bandwidth/SLC cache. I was surprised how much better the M2 core was in single core efficiency so it's also possible that its low multicore score given the 12 P-cores is just heat/TDP limiting multicore clock speed. Then again, the higher clocked Qualcomm SKU got better single efficiency than the lower one ... maybe substantially better binning so that even though the clock is higher the 80 is still more efficient than the 78? The multicore performance, power draw, need for active cooling are why comparisons to the Air aren't great for this class of device. Get this SOC some E-cores.

Overall, it is a pity for Qualcomm that they didn't release this last year. It is still a very good first generation product, but it would've made a much bigger impact then. Version 2 needs to add efficiency cores or substantially improve the primary core architecture. The GPU needs improvement too - given Qualcomm's performance in the mobile space I was a little surprised here, but some of this also may be a case of driver issues (and in some case Prism issues for games) that will get fixed over time. But if Qualcomm is going to play in the PC space long term, they're going to need a better GPU or partner and get dGPUs.
 
Notebookcheck reviews:



Very balanced reviews - ignore the Intel/AMD fanboys in the comments section at the end accusing NBC of faking everything or hiding the truth. However, I think the lack of efficiency cores is really hurting Qualcomm here especially for needing active cooling to get its multicore scores and the effect of the lack of efficiency cores on multicore scores is possibly compounded by the lack of ram bandwidth/SLC cache. I was surprised how much better the M2 core was in single core efficiency so it's also possible that its low multicore score given the 12 P-cores is just heat/TDP limiting multicore clock speed. Then again, the higher clocked Qualcomm SKU got better single efficiency than the lower one ... maybe substantially better binning so that even though the clock is higher the 80 is still more efficient than the 78? The multicore performance, power draw, need for active cooling are why comparisons to the Air aren't great for this class of device. Get this SOC some E-cores.

Overall, it is a pity for Qualcomm that they didn't release this last year. It is still a very good first generation product, but it would've made a much bigger impact then. Version 2 needs to add efficiency cores or substantially improve the primary core architecture. The GPU needs improvement too - given Qualcomm's performance in the mobile space I was a little surprised here, but some of this also may be a case of driver issues (and in some case Prism issues for games) that will get fixed over time. But if Qualcomm is going to play in the PC space long term, they're going to need a better GPU or partner and get dGPUs.
Yeah, good reviews. The question for me is will they be able to make a dent given the upcoming Intel/amd releases?
 
Last edited:
it's also possible that its low multicore score given the 12 P-cores is just heat/TDP limiting multicore clock speed

The other possibility is that they just cannot keep them fed. A really fast core that gets starved on both ends is probably worse than a slower core that can get its data and code at an appropriate speed. Qualcomm probably chose the all-P configuration because it was a little simpler to implement, but one of the SoC advantages to putting in E-cores is that, since they are so physically small, that leaves you with more real estate for GPU cores and specialty logic.
 
Yeah, good reviews. The question for me is will they be able to make a dent given the upcoming Intel/amd releases?
Yup would've had bigger impact end of last year which is, from Dell's marketing leak, clearly what they were targeting.

The other possibility is that they just cannot keep them fed. A really fast core that gets starved on both ends is probably worse than a slower core that can get its data and code at an appropriate speed. Qualcomm probably chose the all-P configuration because it was a little simpler to implement, but one of the SoC advantages to putting in E-cores is that, since they are so physically small, that leaves you with more real estate for GPU cores and specialty logic.
Aye that's what my comment about the RAM bandwidth/SLC cache is getting at. When Apple runs 8 to 12 P-cores in M2/M3, they do so in an SOC with larger SLC cache and higher RAM bandwidth. My non-mutually exclusive three guesses for the reason(s) underpinning the lower than expected MT scores and overall power behavior is that there are no E-cores, their P-cores aren't quite as good as M2 P-cores, and 12 of them running full tilt would get starved anyway.
 
Notebookcheck reviews:



Very balanced reviews - ignore the Intel/AMD fanboys in the comments section at the end accusing NBC of faking everything or hiding the truth. However, I think the lack of efficiency cores is really hurting Qualcomm here especially for needing active cooling to get its multicore scores and the effect of the lack of efficiency cores on multicore scores is possibly compounded by the lack of ram bandwidth/SLC cache. I was surprised how much better the M2 core was in single core efficiency so it's also possible that its low multicore score given the 12 P-cores is just heat/TDP limiting multicore clock speed. Then again, the higher clocked Qualcomm SKU got better single efficiency than the lower one ... maybe substantially better binning so that even though the clock is higher the 80 is still more efficient than the 78? The multicore performance, power draw, need for active cooling are why comparisons to the Air aren't great for this class of device. Get this SOC some E-cores.

Overall, it is a pity for Qualcomm that they didn't release this last year. It is still a very good first generation product, but it would've made a much bigger impact then. Version 2 needs to add efficiency cores or substantially improve the primary core architecture. The GPU needs improvement too - given Qualcomm's performance in the mobile space I was a little surprised here, but some of this also may be a case of driver issues (and in some case Prism issues for games) that will get fixed over time. But if Qualcomm is going to play in the PC space long term, they're going to need a better GPU or partner and get dGPUs.

They don’t have a consistent methodology for measuring power consumption, which makes the efficiency comparisons hard to reason about. I think the “real” efficiency of the Oryon platform is higher.

However, I am very surprised about the performance. The 4.2GHz SKU can barely match the 3.7GHz M2 in Cinebench - what is happening here? These are supposed to be designs with quite similar architecture, is Oryon running lower frequency than advertised or is there some other problem? Also, the multi-core performance is alarming IMO. With 3x as many performance cores Oryon only manages 60% better performance than M3 and is practically equivalent to the 8+4 M2 Pro. I thought it was supposed to be a server core with very good performance at lower power consumption? Is there an issue with Cinebench running on the platform?
 
They don’t have a consistent methodology for measuring power consumption, which makes the efficiency comparisons hard to reason about. I think the “real” efficiency of the Oryon platform is higher.

However, I am very surprised about the performance. The 4.2GHz SKU can barely match the 3.7GHz M2 in Cinebench - what is happening here? These are supposed to be designs with quite similar architecture, is Oryon running lower frequency than advertised or is there some other problem? Also, the multi-core performance is alarming IMO. With 3x as many performance cores Oryon only manages 60% better performance than M3 and is practically equivalent to the 8+4 M2 Pro. I thought it was supposed to be a server core with very good performance at lower power consumption? Is there an issue with Cinebench running on the platform?
Hard to say. Not a good source usually but Maxtech tested one of the models with Hwinfo reporting the frequencies during some tests. It does seem the X Elite lowers frequency quite soon and continues at that speed or drops further.
 
They don’t have a consistent methodology for measuring power consumption, which makes the efficiency comparisons hard to reason about. I think the “real” efficiency of the Oryon platform is higher.

To be fair I think it is the same as their usual methodology, just with CB R24 instead CB R15/23, no? And they did rerun CB R24 on the other processors listed with this method, which is unfortunately just a subset of the ones they have performance scores for, so it should be comparable for the ones they have in their graph. That said, it is a single benchmark from a small set of devices with no repeats. Always a concern that a review site gets a lemon or a statistical outlier in either direction by random chance.
However, I am very surprised about the performance. The 4.2GHz SKU can barely match the 3.7GHz M2 in Cinebench - what is happening here? These are supposed to be designs with quite similar architecture, is Oryon running lower frequency than advertised or is there some other problem?
For the CB R24 single core right? I think they tested the 4.0 GHz Snapdragon (it's the 80, they're still waiting on the 84) though I think the M2 Pro is actually 3.5GHz. As far as I can tell the 3.7GHz was only on full M2 Max and M2 Ultra SOCs (basically Apple doing a small bump like say for a desktop SOC 🙃). M2 Pros and binned M2 Maxes seem to be 3.5 GHz. And I think that's why the SC efficiency score is so bad relative to the M2 Pro. This is what Qualcomm told people to expect though, so it's not like the tested SOC is a lemon or some statistical outlier in terms of performance. That said, yeah it could be a problem with the an unoptimized benchmark for the SOC ... we see GB6.2 SC score much more reasonably here. My guess is GB 6 is much closer to the truth for most workloads which also makes the efficiency estimate based on that score unreliable and admittedly also makes the CB R24 MT benchmark a little suspect too.

Also, the multi-core performance is alarming IMO. With 3x as many performance cores Oryon only manages 60% better performance than M3 and is practically equivalent to the 8+4 M2 Pro. I thought it was supposed to be a server core with very good performance at lower power consumption? Is there an issue with Cinebench running on the platform?

Unlike with SC, it's not like GB6.2 tells a different story here. True it has its own (justifiable) quirks with MT scaling, but ... the scores are what they are. To really nail it down, we would need to get another test, maybe GB5 to confirm what is going on (or GB6 multicore sub-scores known to scale linearly with core count). If it is true, this where I get into my 3 reasons above for why Qualcomm's multicore scores are underwhelming. I suppose I should include unoptimized CB R24 benchmark as a fourth - but it was CB R23 that was really bad, though? at least on Macs CB R24 behaves reasonably and the part that we suspected was the culprit for CB R23, NEON, is shared between the Mac and the Qualcomm CPUs.

Hard to say. Not a good source usually but Maxtech tested one of the models with Hwinfo reporting the frequencies during some tests. It does seem the X Elite lowers frequency quite soon and continues at that speed or drops further.
That was on battery though and for the multithreaded test too I think. Basically it undercuts his "the performance doesn't change on battery" line he kept repeating over and over again - single threaded performance might not, multithreaded performance clearly was affected being on battery. His battery MT score is much lower than NBC's for the same Surface device. Also the video had other problems. In fairness to the Snapdragon, unlike what Vadim claimed in the video, the Adreno GPU has ray tracing and Ryan Shout in his, admittedly paid, preview had Solar Bay working and we know Solar Bay works on Android devices with Adreno. That said, both these reviews and others have mentioned frequent crashes and instability issues with benchmarks, but most of those as far as I could tell were with emulated programs. Unless Solar Bar is non-Windows ARM native unlike the rest of 3D Mark's supposedly is, it should've worked.

Basically another brammer of a video from MaxTech.
 
Last edited:
However, I am very surprised about the performance. The 4.2GHz SKU can barely match the 3.7GHz M2 in Cinebench - what is happening here? These are supposed to be designs with quite similar architecture, is Oryon running lower frequency than advertised or is there some other problem? Also, the multi-core performance is alarming IMO. With 3x as many performance cores Oryon only manages 60% better performance than M3 and is practically equivalent to the 8+4 M2 Pro. I thought it was supposed to be a server core with very good performance at lower power consumption? Is there an issue with Cinebench running on the platform?
But how do you optimize primarily for lower power consumption? The usual answers mean that fully server-optimized cores don't scale well to higher frequencies. Instead, they're all about low-ish voltage and frequency operating points. Because they're working with a core that may have been optimized for that kind of thing, Qualcomm may need to boost voltages (and therefore per-core power) quite a bit to hit 4.2 GHz.

Note that QC has mentioned the peak numbers like 4.2 GHz are available only when running at most 1 to 2 threads. Any more and clocks have to roll back. All the data so far suggests that the amount they roll back tends to be quite a bit.

To be clear, Apple plays this game too. Their P clusters can only run at max frequency if at most 1 thread is active. Furthermore Apple's scheduler optimizes for power over performance. That means that even if you use a M series chip with two P clusters, you will never observe peak frequency on a 2-thread load, since both threads will get scheduled to the same P cluster so that the other P cluster can be asleep. (Big power savings potential there - they are likely able to power gate it so it draws literally 0.0mW.)

That said, usually Apple's dropoff from peak ST clock speed is mild. That's an advantage in heavily threaded loads - it means Apple is able to be competitive with far fewer cores.
 
My overall impression is it’s fine. Pretty good in some aspects. The web browsing speed is the closest anyone has come to Apple Silicon. I think QC over-promised, but I’d wager that Intel and AMD have also done that with their releases.

i hope if anything, these new releases force Apple to increase ram and ssd capacit and to speed up their oled roll out.

I also hope it ends the idea that GW took all the talent with him.
 
But how do you optimize primarily for lower power consumption? The usual answers mean that fully server-optimized cores don't scale well to higher frequencies. Instead, they're all about low-ish voltage and frequency operating points. Because they're working with a core that may have been optimized for that kind of thing, Qualcomm may need to boost voltages (and therefore per-core power) quite a bit to hit 4.2 GHz.

That is exactly what I meant - I’d expect that Oryon very good performance at lower wattage. These particular tests paint a very different picture. Qualcomm claimed that Oryon can match M2 at lower power draw. Here we see Oryon barely holding out against M2 despite massive core count advantage and probably higher power consumption.
 
Notebookcheck reviews:



Very balanced reviews - ignore the Intel/AMD fanboys in the comments section at the end accusing NBC of faking everything or hiding the truth. However, I think the lack of efficiency cores is really hurting Qualcomm here especially for needing active cooling to get its multicore scores and the effect of the lack of efficiency cores on multicore scores is possibly compounded by the lack of ram bandwidth/SLC cache. I was surprised how much better the M2 core was in single core efficiency so it's also possible that its low multicore score given the 12 P-cores is just heat/TDP limiting multicore clock speed. Then again, the higher clocked Qualcomm SKU got better single efficiency than the lower one ... maybe substantially better binning so that even though the clock is higher the 80 is still more efficient than the 78? The multicore performance, power draw, need for active cooling are why comparisons to the Air aren't great for this class of device. Get this SOC some E-cores.

Overall, it is a pity for Qualcomm that they didn't release this last year. It is still a very good first generation product, but it would've made a much bigger impact then. Version 2 needs to add efficiency cores or substantially improve the primary core architecture. The GPU needs improvement too - given Qualcomm's performance in the mobile space I was a little surprised here, but some of this also may be a case of driver issues (and in some case Prism issues for games) that will get fixed over time. But if Qualcomm is going to play in the PC space long term, they're going to need a better GPU or partner and get dGPUs.
I’ll be honest guys (won’t hurt my feelings or anything if others say similar lol)

I don’t think it looks like what they probably intended. I am starting to suspect that Demerjian and others had some points about the PMIC last minute screwery, or more likely that they just had a lot of respins and power, while good vs Intel AMD in ST, just wasn’t what it was supposed to be. I didn’t expect it to beat Apple’s latest by any means, but I would’ve liked to see some real Firestorm/Avalanche-territory power on ST, and the ability to clock down with a steeper curve. Way better than AMD/Intel for that asking yes, but curves don’t look steep enough, and it’s a bit worse than the graph because the X1E-78 is the 50-75%ile stuff, whereas the X1E-80/84 are above that and more efficient, just pushed more.



I know this seems like straight up cope, and it is, but part of it is also honest.

We’ll also know if there’s truth to this — that there’s a physical design issue somewhere they just really didn’t have ready or messed up — or the PMIC story also — later on, for two reasons.

#1: the 8 Gen 4 is the smartphone using Oryon on N3E and is rumored to clock to 3.8-4.3GHz…. Even if the microarchitecture is slightly faster and different in performance terms - like say 5% IPC — that, a larger SLC cache, N3E and a smaller DRAM bus alone are NOT enough to take this thing to being competitive at say 3.4GHz much less 3.8-4.3GHz. N3E will get them roughly 15-20% lower power from N4P, more SLC could help too, another say 10% off, smaller DRAM we’ll leave off for now but add 5%.


Current X1E-78 at 3.4Ghz is doing like 10W normalized. Porting Oryon and taking 35-40% off to get it to 6W for phone packages just absolutely wouldn’t be enough — they’d have a chip not far from the 8 Gen 3 with an X4 except the X4 there is on N4P and probably smaller.

So what gives? Well, rumors are rumors but I’ll say this, if the 8 Gen 4 P cores hit 3.7-4.3GHz in the 4-7W range, basically around that 2800-3200 ST at tolerable phone profiles (last one is upper bound, 7W is too much but it’d still be impressive engineering wise), we know something was fucked up with the X Elite.
 
Last edited:
#2: this doesn’t prove much because it rests on the assumption QC with Nuvia really can replicate the work at Apple within closer margins on power. But it’s interesting still.

Oryon = Firestorm at 3.4GHz (slightly lower IPC higher clocks) with a very very similar structure and architecture from top down.

Does anyone believe that if Apple shipped a 12C, 3 4c cluster of Firestorms with 12MB of L2 for each, and a 8MB SLC (Qualcomm’s is 6 but same thing) as an M1X with a 128-bit bus, that the perf/W would be +-5% of Oryon here for the 2-3.4GHz range in both ST and MT?

I doubt it — the Apple part would have a lead on battery life and MT at most power points. Firestorm even on a worse node worse LPDDR and with eerily similar L2 & SLC sizes — was performing at the 3.2GHz point at like 5-7W total package power minus idle for general integer stuff. And the phone part was around 4W flat.

I mean, we do need to see more designs and tests, but it seems like Oryon at 3.4GHz is running substantially — even like 50-75% — higher than an M1 with a 128B Bus.

Alternatively, in similar wall power tests an M2 looks like it’s doing 8-10W pre-idle removal, or like 5W ranging to 8W for ST stuff. So still winning, but with 10% more ST, and a slightly inferior node.



M3 and M4 make this look even worse and I won’t even touch those because N3 anyways and these are more important comparisons academically speaking for now.
 
Also. The lack of E Cores I wrote off as a big deal for now, but yeah I think it’s true that is actually hurting them.

But it doesn’t concern me as much as this does — because this is the foundation lol, and if they were more efficient (unless I’m missing something here), they’d still post bigger leads in battery life I think.

But yeah if they want true M-class battery they need E cores. Just think they could get a good bit closer with more efficient cores like Apple’s. Feels like they’re in between so awkwardly with this.
 
#1: the 8 Gen 4 is the smartphone using Oryon on N3E and is rumored to clock to 3.8-4.3GHz….
The 8 gen 4 is 2 by X, 6 by 725. Odd that they have no 500-series cores in a phone – maybe those are going into specialized SoCs, or migrating toward R and M chips.
 
That is exactly what I meant - I’d expect that Oryon very good performance at lower wattage. These particular tests paint a very different picture. Qualcomm claimed that Oryon can match M2 at lower power draw. Here we see Oryon barely holding out against M2 despite massive core count advantage and probably higher power consumption.
Tbc and I’ll explain this once again:

Qualcomm claimed they could match M2 Max, which is a shady thing they did IMO but the reason they did that is because, and again people don’t like hearing it but it’s true (and they still get fine battery!) Mx Max stuff consumes way more power in ST even for similar performance because of probably DRAM or other weird overhead stuff. That’s how they got “30% lower power than M2 Max *iso-perf*” — besides Linux being used it was “we can do their ST at 30% less, or beat them by 10% or whatever”

Which checks out. M2 Max from Notebookcheck is running from the wall like 18-20W before idle normalizing. If Qualcomm was taking say 16-17W for idle normalizing (just for the normalization of background stuff even) and took 30% off, they’d be at like 10-11W to match Apple’s M2 ST — and this wasn’t their peak either, just probably around 4GHz, and on Linux.



The smaller M2 has nearly the same ST, but is running about 5.5-8W depending on workload (rough estimate from several sources).

They ain’t matching that.

It was shady for sure but from the getgo they were not able to match the M stuff, they just fell even a bit more short than I expected to a curious degree.
 
The 8 gen 4 is 2 by X, 6 by 725. Odd that they have no 500-series cores in a phone – maybe those are going into specialized SoCs, or migrating toward R and M chips.


Qualcomm has already confirmed Oryon is coming to 8 Gen 4 though. They’ll have E Cores too for sure and according to all the leaks/rumors.

2 Oryon Big
6 Oryon little
 
Honestly, I am wondering what we are getting with these performance gains. For the vast majority of workloads, the difference is vanishing. Who truly cares or notices, outside of a few engineers and corner-case users? Serious work gets done in the EP modules – GPU and NPU/Tensor – improving CPU cores performance is an exercise in diminishing returns.
 
Back
Top