Nuvia: don’t hold your breath

Well with say M4 vs M1, if you held clocks constant I’d wager you’re looking at + 17-20% more integer performance and still like probably 40% lower power. Technically the wider arch would use slightly more power so you might have to lower clocks a bit more, but still, that’s all else equal.

Node alone would take you ~ 25-35% down from N5 for the CPU, but the extra L2, newer LPDDR5(x?) and other physical design improvements which we know they’ve seen probably means it’s higher than that in total for the package.

Idk that seems really good to me? Or doing it at similar power and 30% faster even if you don’t notice is still more energy efficient upon the king of energy efficiency.



The E Core improvements are arguably even more important/impressive in some ways
 
By the way, in GB6 multicore Ray Tracer test Oryon shows a good lead over M2 Pro/Max, as expected. Since it uses the same RT library as Cinebench, this is a surprising discrepancy. Maybe the Cinebench workload is somehow less-than-optimal for Oryon, or maybe there is some other issue?
 
By the way, in GB6 multicore Ray Tracer test Oryon shows a good lead over M2 Pro/Max, as expected. Since it uses the same RT library as Cinebench, this is a surprising discrepancy. Maybe the Cinebench workload is somehow less-than-optimal for Oryon, or maybe there is some other issue?
I assume you meant the M2 has a lead over Oryon?

Edit: or not. I’m confused. If the Oryon core leads how can CB be less than optimal for it?
 
Last edited:
I assume you meant the M2 has a lead over Oryon?

Edit: or not. I’m confused. If the Oryon core leads how can CB be less than optimal for it?

Oryon is faster in GB6 multicore raytracing tests, but same speed in Cinebench tests as reported by notebookcheck. The GB6 results are consistent with Qualcomms marketing, Cinebench results are not.
 
I assume you meant the M2 has a lead over Oryon?

Edit: or not. I’m confused. If the Oryon core leads how can CB be less than optimal for it?
I haven't had time to check, but he means that in CB and overall GB6.2 the multicore scores for the Oryon Snapdragon are disappointingly similar to the M2/M3 Pro. However, for GB6.2 ray tracing the Oryon Snapdragon pulls ahead - as it frankly should with 12 P-cores. We know that overall that GB6.2 is weighted against large count systems, but CB R24 should behave better and it isn't.

Given the similarities between the M2 and Oryon core, the three most likely possibilities are: some sort of bad interaction between CB24 and (ARM) Windows, an effect of the 10-minute length of the test (Oryon not being able to maintain clocks even on battery), and some small difference in the architecture that makes a big difference here. Normally I'd favor the middle one but I think Tom's hardware ran CB multiple times in a row and the middle scores were the highest. So that doesn't fit if thermals or hotspots were the issue.



Compare the Ray tracing multicore

Having said that, here is Geekbench 5:



Haven't had time to actually parse these result so I'm not sure yet how to interpret them.

By the way, in GB6 multicore Ray Tracer test Oryon shows a good lead over M2 Pro/Max, as expected. Since it uses the same RT library as Cinebench, this is a surprising discrepancy. Maybe the Cinebench workload is somehow less-than-optimal for Oryon, or maybe there is some other issue?
Then again CB R23 on AS was worse than ray tracing algorithms that also used Intel's Embree (GB and SPEC and others I saw). Given this, irrespective of Embree, CB always seems to be the most finicky. Sadly Andrei is probably unable to respond now even more than previously when it was CB R23 and AS but there seems to be something going on.

Oryon is faster in GB6 multicore raytracing tests, but same speed in Cinebench tests as reported by notebookcheck. The GB6 results are consistent with Qualcomms marketing, Cinebench results are not.

Their marketing included CB results, but I'd say it makes more sense just given the core count. Then again, GB5 results don't look great, but there may be something I'm missing there.
 
I haven't had time to check, but he means that in CB and overall GB6.2 the multicore scores for the Oryon Snapdragon are disappointingly similar to the M2/M3 Pro. However, for GB6.2 ray tracing the Oryon Snapdragon pulls ahead - as it frankly should with 12 P-cores. We know that overall that GB6.2 is weighted against large count systems, but CB R24 should behave better and it isn't.

Given the similarities between the M2 and Oryon core, the three most likely possibilities are: some sort of bad interaction between CB24 and (ARM) Windows, an effect of the 10-minute length of the test (Oryon not being able to maintain clocks even on battery), and some small difference in the architecture that makes a big difference here. Normally I'd favor the middle one but I think Tom's hardware ran CB multiple times in a row and the middle scores were the highest. So that doesn't fit if thermals or hotspots were the issue.



Compare the Ray tracing multicore

Having said that, here is Geekbench 5:



Haven't had time to actually parse these result so I'm not sure yet how to interpret them.


Then again CB R23 on AS was worse than ray tracing algorithms that also used Intel's Embree (GB and SPEC and others I saw). Given this, irrespective of Embree, CB always seems to be the most finicky. Sadly Andrei is probably unable to respond now even more than previously when it was CB R23 and AS but there seems to be something going on.



Their marketing included CB results, but I'd say it makes more sense just given the core count. Then again, GB5 results don't look great, but there may be something I'm missing there.
Thanks. Appreciate the clarification.
 
Honestly, I am wondering what we are getting with these performance gains. For the vast majority of workloads, the difference is vanishing. Who truly cares or notices, outside of a few engineers and corner-case users? Serious work gets done in the EP modules – GPU and NPU/Tensor – improving CPU cores performance is an exercise in diminishing returns.
That is one thing that was noted in several reviews where the Qualcomm SOC didn’t perform great on certain workloads that often have dedicated accelerators for those tasks like video encoding and extraction/compression:


Not sure the primary source for the data presented though. Might be thechpowerup itself? But it’s written more as a roundup.

Bear in mind that the M2 GPU does not have hardware RT like the M3. I could be mistaken, but it seems like the M3 GPU is parsecs ahead of the M2 GPU.

The ray tracing here is being done on the CPU, maybe a less relevant workload these days as you allude to above. But it stresses the CPU and CPU vector processing and scales with core counts so still good for similar types of tasks even if CPU ray tracing is less relevant for most users aside from some production houses (and even those might move to GPU processors soon).
 
Qualcomm has already confirmed Oryon is coming to 8 Gen 4 though. They’ll have E Cores too for sure and according to all the leaks/rumors.

2 Oryon Big
6 Oryon little
That's what I thought too, but now I can't find where I saw that. Someone on another forum found a rumor the little cores were actually going to be ARM cores and I found another that claimed the "little" cores weren't actually little, just down clocked P-cores. But I could've sworn I saw an article saying that they were planing on making a dedicated Oryon E-core. But now I'm not sure. Do you have a link?
 
That is exactly what I meant - I’d expect that Oryon very good performance at lower wattage. These particular tests paint a very different picture. Qualcomm claimed that Oryon can match M2 at lower power draw. Here we see Oryon barely holding out against M2 despite massive core count advantage and probably higher power consumption.
Ah, I mistook what you wrote and didn't look at the low watt test numbers.
 
Someone on another forum found a rumor the little cores were actually going to be ARM cores and I found another that claimed the "little" cores weren't actually little, just down clocked P-cores. But I could've sworn I saw an article saying that they were planing on making a dedicated Oryon E-core. But now I'm not sure. Do you have a link?
There is a link in this post:
The 8 gen 4 is 2 by X, 6 by 725. Odd that they have no 500-series cores in a phone – maybe those are going into specialized SoCs, or migrating toward R and M chips.
725 are mid-range ARM cores, possibly modified by QC.
 
There is a link in this post:

725 are mid-range ARM cores, possibly modified by QC.
Right … that’s for the rumor that the middle cores will be ARM Cortex. But that site also claims that they won’t be using Oryons but Cortex X925 for the P-cores which I’m pretty sure is wrong. That would mean they wouldn’t be using their own cores for mobile and also it states they’ll be using P-cores clocked at 4.26GHz which is above the max clock speed allowed by ARM (3.8GHz) for X925. But maybe there’s wiggle room there.

What I was looking for was a link that the middle core is not going to be a X725 at all but a new custom Qualcomm E-core. I could’ve sworn I saw somewhere that but I am now unable to find it. I can now only find links that it’ll be X725 or down clocked Oryons gen2s.
 
Hey guys I just wrote this incredibly lengthy post on Macrumors about x86 vs ARM and why the latter, currently, has an advantage in performance/watt over the former. I'd appreciate any comments to clean it up as I may use it as reference going forwards. :) The section on pipes and decode feels half baked but I don't know how to explain it better without making the post even longer and it is already so long I'm not sure anyone will read it. Obviously if there is anything wrong or something you would disagree with, any corrections would be appreciated.


(the context is that a user posted a review where the reviewer power limited a 16-thread 8840U to the same wattage as a 12 thread Snapdragon Elite, ran multithreaded CB R24, and concluded that x86 could be just as power efficient as ARM and I'm trying to explain to the user why that result doesn't actually mean that and how that relates to the larger topic of x86 and ARM chips, previous post I reference in the one above)
 
Last edited:
The number 4 machine on the Top 500 SC list is Fugaku, which is an ARM core thingy. Every other machine in the top 10 is x86, except for one Power9 installation – but, those other machines rely on GPGPU cards to do the EP work: Fugaku relies entirely on SVE (not SVE2). At 128 bit wide vectors.

Granted, Fugaku is not the most power efficient machine on the list. But it leaves one wondering, what if there was an ARM type installation like that, running SVE2, on wider vectors. It would be interesting to see.
 
The number 4 machine on the Top 500 SC list is Fugaku, which is an ARM core thingy. Every other machine in the top 10 is x86, except for one Power9 installation – but, those other machines rely on GPGPU cards to do the EP work: Fugaku relies entirely on SVE (not SVE2). At 128 bit wide vectors.

Fugaku uses 512-bit vectors, if I recall correctly. Anyway, it’s closer to a GPU than a general-purpose CPU. These are specialized processors, built to solve particular problems in science. I don’t think these designs inform anything about general-purpose processing.
 
Ars Technica testing two Microsoft Surface products with Snapdragon X Elite:


Conslusion: It's way faster that previous Microsoft products with ARM, but still behind (current) Apple Silicon.

Interesting was the comment that despite improvements of Prism, non-native products were just annoying enough to warrant the search for native ports.
From my recollection, the only really annoying non-native application on my MacBook Air M1 was a web browser, because it needed loads of RAM and due to the double-translation of JavaScript it felt quite slow. This definitely improved a lot with the native port.
While I replaced everything else with native builds as soon as possible, because I wanted to reduce CPU usage, other non-native applications never felt annoying to me.
 
Chipsandcheese on the Oryon cores:


I haven't had time to read the whole thing yet but given the title of the thread I thought the following snippet might be amusing!

Oryon arrives nearly five years after Nuvia hit the news, and almost eight years after Qualcomm last released a smartphone SoC with internally designed cores. For people following Nuvia’s developments, it has been a long wait.
 
Back
Top