Geekerwan’s Review of the Snapdragon 8 Gen 3 (Xiaomi 14).

Amazing coming from QC themselves with those slides.. I mean, they MAY be talking about the entire device power consumption? But even then, 70W for that kind of performance is terrible.
That slide does seem to correspond to Notebook check's scores for it - it looks roughly 1.6x the score of the Asus Zenbook 14 in the chart - the only thing I can think of is in the fine print of Qualcomm's charts they mentioned the Asus Zenbook was running unconstrained? So maybe both scores at 70W would be higher than below? Notebook check does say the max score recorded for the 155H was about 1024 points. Otherwise a score of 1220 at 70W that's really, really bad. The M2 Pro scores roughly the same as the M3 Pro (a little lower). I think we discussed this last time but a 12 P-core chip drawing 70W should be much, much better than an 8+4 chip drawing half that (less?). Even compared to the 23W score it's 3x the power for 30% more performance? That shouldn't be the case, so maybe the 80W score here is not really the 70W from the charts? That would put it about 15-1600 at 70 Watts which is not great for 12 M2-like P-cores, but not awful?
Screenshot 2024-04-24 at 7.15.42 PM.png

It’s all very confusing tbh. Here a slide from last October. I don’t know why Andrei obfuscates, but he should speak to qualcomms marketing department! And to be clear, I’m not saying the gpu reaches 80w, but the soc appears to at least.
View attachment 29136
For what it's worth the October slide actually seems to support this better score at 70W. The 13800H also is a pretty close score to the M2 Pro (again a little lower 976). The best score of the Elite looks better here than 25% more than the 13800H? Again maybe 1.5-1.6x?

~1500/1600 is not great at 70W, but it's a damn sight better than 1220 ...
 
Last edited:
Less than half.. The M3 Pro tops out around 29W on the CPU side from memory? M2 Pro around 34W?
Yeah M2 Pro was the same as M2 Max (at 12 cores) and I think it was ~35W CPU and ~41W total package? Something like that? Obviously the M3 is just better than the Oryon ... but I think maybe given the node and the overall design the M2 is the better comparison in terms of where Qualcomm is. To me the Oryon is answering the question what would happen if you took 12 M2 P-cores with no E-cores and ramped the clock speed up? Obviously the fabric and SOC cache design is different too and Apple's may be better here. For the Oryon, I'm going with a CB24 score of 1500-1600 at 70W because I can't believe it's only 1220. But maybe it really is that bad ...

Let's say the top 8+4 M2 Pro (eq. to M2 Max) was 35/41W CPU/package - base clocks in Oryon are about 15% higher in their top model, no E-cores but 4 more P-cores. For simplicity assume the 4 Apple E-cores are roughly 1 Apple P-core in power and performance (it's better than that but never mind), let's do 3.8/3.3 Ghz * 12/9 cores * power = 54/63 W CPU/package. For CB24performance that would be 3.8/3.3 Ghz * 12/9 cores * 1059 = 1625. My estimate for power is a little low and my performance estimate seems a little high, probably because of my over simplification of the E-core contribution to power and performance. But overall it tracks. If the Oryon SOC's CB24 performance is actually 1220 "at 70W" on the other hand ... well then this shows that something got majorly fucked somewhere ...

Also good to point out that the name "All core turbo" for the clock speeds as reported in Anandtech imply those are not the clock speeds attainable under thermal constraints (i.e. the "23W" chassis). In addition to the different tiers of processors, it's unclear what their base clocks actually are. So a lot of the thin and lights aren't going to be hitting these clocks for very long.
 
Well maybe I’m wrong but I don’t see how they get to 80W - certainly it ain’t from the GPU, which tapping out at 4.6TFLOPs is the same size as some of Qualcomm’s phone GPUs and basically moderately bigger than an M3 base GPU. So if they are hitting 80W from their SOC alone it’s because they are riding the CPU hard on GHz.

Edit: 4.6 not 4.2

A note on that: 4.6TFLOPs certainly sounds impressive, but since it’s the same GPU as in their high-end smartphones that excels at synthetic graphical benchmarks and fails badly in compute… I would expect too much. The current GB6 compute entries are between A14 and base M1.
 
but I think maybe given the node and the overall design the M2 is the better comparison in terms of where Qualcomm is. To me the Oryon is answering the question what would happen if you took 12 M2 P-cores with no E-cores and ramped the clock speed up?

I’ve said it before and I’ll say it again - to me it seems that Oryon is a rebuild of Firestorm at N4. They can run it at M2 clocks with slightly lower power, but it takes a huge efficiency hit if the clocks are pushed any further. I am still puzzled by Qualcomm claims that a single Oryon core outperforms an Avalanche while using less power yet 12 Oryon cores have difficulty keeping up with 8 Avalanche cores while consuming more power. No idea whether it’s the power system or some other factor, the scaling is really bad though.
 
A note on that: 4.6TFLOPs certainly sounds impressive, but since it’s the same GPU as in their high-end smartphones that excels at synthetic graphical benchmarks and fails badly in compute… I would expect too much. The current GB6 compute entries are between A14 and base M1.
Absolutely, in fact my point was the for the 80W form factor a 4.6TFLOP GPU isn't impressive at all. That's at the M2/M3 Max level and the M2/M3 Max GPU is nearly 3x that isn't it? They'll need a dGPU. Even without their compute troubles that level of GPU won't cut it except for maybe a coding/development machine which is otherwise unconcerned with GPU performance beyond driving a nice screen or two ... or three.
I’ve said it before and I’ll say it again - to me it seems that Oryon is a rebuild of Firestorm at N4. They can run it at M2 clocks with slightly lower power, but it takes a huge efficiency hit if the clocks are pushed any further. I am still puzzled by Qualcomm claims that a single Oryon core outperforms an Avalanche while using less power yet 12 Oryon cores have difficulty keeping up with 8 Avalanche cores while consuming more power. No idea whether it’s the power system or some other factor, the scaling is really bad though.
Since Avalanche is rebuild of Firestorm at N4 (though maybe a better one) I think we're on the same page. :) And yes the scaling is weird, especially if the 1220 CB24 score is the score at 70W, then something went incredibly wrong. If the CB24 score is anywhere around 1500 at 70W, it's only a little worse than it should be.
 
M2 was N5P I thought? No idea what the difference is.

You’re right it’s N5P. According to this how they compare depends if it’s N4 or N4P. Regular N4 might have slightly worse characteristics than N5P while N4P might be slightly better:


1714028354590.jpeg


Avalanche did introduce some tweaks to the buffer sizes etc.

I think Andrei said something about floor plan layout shifting.
 
@leman posted over at Macrumors a recent chipsandcheese article that is highly relevant!


They appear to conclude that a lot of Qualcomm’s difficulties in compute is a result of cache (or lack thereof). I’ll need a more thorough reading.
They issued a correction though nothing concerning the caches and compute issue we were discussing:


I wrote about Qualcomm iGPUs in three articles. All three were difficult because Qualcomm excels at publishing next to no information on their Adreno GPU architecture.

I originally started writing articles because I felt tech review sites weren’t digging deep enough into hardware details. Even when manufacturers publish very little info, they should try to dig into CPU and GPU architecture via other means like microbenchmarking or inferring details from source code. A secondary goal was to figure out how difficult that approach would be, and whether it would be feasible for other reviewers.

Adreno shows the limits of those approaches.
 
Seen courtesy of Xiao_Xi at Macrumors, newest chipsandcheese article on the Snapdragon X's iGPU:


It confirms Anandtech's deep dive article that GMEM cache can be flexibly used for more than just tiled graphics tasks, including scratchpad memory, but it is unclear if this is a recent change or if the previous Adreno GPUs could do this and they simply couldn't get it working. That said, cache issues remain a problem and, overall, the cache hierarchy seems very complicated, especially compared to more modern GPUs. 64bit support is either poor (Int) or non-existent (FP) and driver and software problems are massive problems as well - OpenCL drivers are awful and graphics drivers are, to put it kindly, subpar and the process for updating drivers is rudimentary.

Basically Qualcomm has *a lot* of work to do here to make a truly PC-centered iGPU. This iGPU is everything naysayers (incorrectly) claimed Apple's GPUs would be.
 
Last edited:
Back
Top