Geekerwan’s Review of the Snapdragon 8 Gen 3 (Xiaomi 14).

leman · Jan 30, 2024

theorist9 said:
Yes, I decide purely based on Stockfish.

Oh no…

leman · Jan 31, 2024

theorist9 said:
Yes, I decide purely based on Stockfish.

Btw, I just realized that not anybody will get the reference. Is this how memes are born?

mr_roboto · Jan 31, 2024

leman said:
Btw, I just realized that not anybody will get the reference. Is this how memes are born?

Nah, I feel that memes have to be broad and understandable, otherwise they don't spread. This is definitely an in-joke; if you weren't one of the relative handful of people reading the right threads in the right subforum of the Other Place, you just aren't gonna get it.

Hell, almost all non chess players (and even lots of casual players) are going to get hung up on "Stockfish". It's not a name that tells you what it is!

Jimmyjames · Feb 1, 2024

Interesting tidbit. Apparently the Adreno 750, which is the Qualcomm 8 gen 3 gpu has peak ALU 5.7 TFLOPS at fp32. That seems high and probably higher than the A17. Utilization must be low given the results we’ve seen

https://Twitter or X not allowed/biovf/status/1752816050435928501

Edit: A17 is just over 2 Tflops. What’s going on here? Are Qualcomm cooking the books, or just unable to feed the gpu enough data, or something else?

dada_dave · Feb 1, 2024

Jimmyjames said:
Interesting tidbit. Apparently the Adreno 750, which is the Qualcomm 8 gen 3 gpu has peak ALU 5.7 TFLOPS at fp32. That seems high and probably higher than the A17. Utilization must be low given the results we’ve seen

https://Twitter or X not allowed/biovf/status/1752816050435928501

Edit: A17 is just over 2 Tflops. What’s going on here? Are Qualcomm cooking the books, or just unable to feed the gpu enough data, or something else?

I’ve been wondering that myself. @leman ’s dive into its structure seems to fit its performance on real world titles but the FP32 throughput has been measured and it is supposedly quite high so it can’t just be that Qualcomm is prioritizing FP16. Here’s @leman ‘s explanation for the discrepancy in the other thread:

leman said:
Qualcomm optimisation manuals recommend using a "native" version of operations for best performance, and they explicitly state that these "native" operations are suitable for graphics and other tasks where numerical precision is less important. They also explicitly state that Adreno can execute FP16 operations at higher rate than FP32 ones. I also found at least one mention that Adreno dos FP32 math at 24-bit precision in the graphics pipeline.

The thing is, all these are very valid optimisation techniques if mobile graphics is your focus. And lower ALU precision is not the only possible optimisation. You can ship smaller register files, lower precision texture filters, slower advanced function etc. and your users won't notice any of this because the shader complexity of mobile games is fairly low (no idea whether Qualcomm uses any of these optimisations). So if that's your goal, you can build a fairly fast GPU that's also small and power efficient. But this GPU will suck at general-purpose computing or complex applications. Which is exactly what we see in case of Qualcomm.

It sounds reasonable but programs that purport to measure FP32 throughput should not allow lower precision shenanigans. I dunno something is odd. There are other pipelines like the complex pipelines (sine, log, etc …) that can impact real world performance too …

In terms of feeding the GPU, if my memory serves, they’re using more advanced memory than the iPhone and other benchmarks show them climbing on the iPhone with greater resolution. I believe they likely have better bandwidth than the iPhone.

In the comments on the video the video author said that at least some of Genshin’s impact performance issues may be due to low quality initial drivers. The iPhone 15 suffered from thermal issues at launch that Apple eventually got under control, its possible future updates may improve Qualcomm’s performance. Obviously that’s unknown. There were also questions over exactly what resolution the game was being rendered at. Everyone agreed that the iPhone was being rendered at 740p, but some defensive fan boys claimed that the Adreno was rendering in the 800s. This was shot down in the comments by the author saying the standard tools were misreporting and it was actually rendering at 720p on the Adreno. I would assume he knows what he’s doing. Finally Genshin impact may be better optimized for iOS than Android.

Bottom line: these Qualcomm GPUs are really fucking confusing. I hope we get more detailed information from the Orion SOC launch with more comprehensive benchmarks and maybe architectural details so that we get an answer to some of these questions.

Yoused · Feb 1, 2024

dada_dave said:
It sounds reasonable but programs that purport to measure FP32 throughput should not allow lower precision shenanigans.

leman says FP32 is done at 24-bit precision: how is that surprising? FP32 has a 23+1 mantissa, so 24-bit precision is mostly what one would expect. At most, I would not expect FP32 calculations to exceed 27-bit precision, with a few low-order bits tacked on, but it would be barely observable.

dada_dave · Feb 1, 2024

Yoused said:
leman says FP32 is done at 24-bit precision: how is that surprising? FP32 has a 23+1 mantissa, so 24-bit precision is mostly what one would expect. At most, I would not expect FP32 calculations to exceed 27-bit precision, with a few low-order bits tacked on, but it would be barely observable.

I assumed he meant 24-bit including the exponent - sort of an automatic fast math - otherwise that would indeed be unremarkable.

leman · Feb 1, 2024

I really don’t see 5.7TFLOPs FP32 on 750 happening, at least not in any way that’s meaningful. First, that number alone is ridiculously high. It would require 2048 FP32 pipelines running at 1.4GHz to get to that number. Are they claiming that 8gen3 has a GPU as an RX 7600? I mean, sure, if they use very wide SIMD and sacrifice precision, they could get there, but the shader utilization in real world work would be terrible. Look at GB compute results, they have difficulty competing with older Apple designs, does it look like a 5+ TFLOPs GPU to you?

Where Qualcomm has a big advantage is memory bandwidth. Then again, Apple probably has more cache.

leman · Feb 1, 2024

dada_dave said:
I assumed he meant 24-bit including the exponent otherwise that would indeed be unremarkable - sort of auto-fast math.

They could be using a narrower mantissa, that would be a cheap way to reduce the SIMD footprint without introducing significant errors for mobile games.

dada_dave · Feb 1, 2024

leman said:
I really don’t see 5.7TFLOPs FP32 on 750 happening, at least not in any way that’s meaningful. First, that number alone is ridiculously high. It would require 2048 FP32 pipelines running at 1.4GHz to get to that number. Are they claiming that 8gen3 has a GPU as an RX 7600? I mean, sure, if they use very wide SIMD and sacrifice precision, they could get there, but the shader utilization in real world work would be terrible. Look at GB compute results, they have difficulty competing with older Apple designs, does it look like a 5+ TFLOPs GPU to you?

Yeah I don’t get it. Something is very off.

leman said:
Where Qualcomm has a big advantage is memory bandwidth. Then again, Apple probably has more cache.

Yup.

leman said:
They could be using a narrower mantissa, that would be a cheap way to reduce the SIMD footprint without introducing significant errors for mobile games.

Definitely possible but if so I’m surprised that the programs that purport to measure such things allow that to fly. The default should be to report IEEE complaint calculations. I mean if I turn fast-math on in the compiler flags while running a simulation I know what I’m doing and what I’m measuring and the tradeoffs that entails, but that shouldn’t be the default and at least it should definitely be reported. Admittedly in my own GPU paper, I used fast-math on but I told people that’s what I did and I showed that I still got good results.

Jimmyjames · Feb 1, 2024

dada_dave said:
I’ve been wondering that myself. @leman ’s dive into its structure seems to fit its performance on real world titles but the FP32 throughput has been measured and it is supposedly quite high so it can’t just be that Qualcomm is prioritizing FP16. Here’s @leman ‘s explanation for the discrepancy in the other thread:

It sounds reasonable but programs that purport to measure FP32 throughput should not allow lower precision shenanigans. I dunno something is odd. There are other pipelines like the complex pipelines (sine, log, etc …) that can impact real world performance too …

In terms of feeding the GPU, if my memory serves, they’re using more advanced memory than the iPhone and other benchmarks show them climbing on the iPhone with greater resolution. I believe they likely have better bandwidth than the iPhone.

In the comments on the video the video author said that at least some of Genshin’s impact performance issues may be due to low quality initial drivers. The iPhone 15 suffered from thermal issues at launch that Apple eventually got under control, its possible future updates may improve Qualcomm’s performance. Obviously that’s unknown. There were also questions over exactly what resolution the game was being rendered at. Everyone agreed that the iPhone was being rendered at 740p, but some defensive fan boys claimed that the Adreno was rendering in the 800s. This was shot down in the comments by the author saying the standard tools were misreporting and it was actually rendering at 720p on the Adreno. I would assume he knows what he’s doing. Finally Genshin impact may be better optimized for iOS than Android.

Very interesting, thanks.

dada_dave said:
Bottom line: these Qualcomm GPUs are really fucking confusing. I hope we get more detailed information from the Orion SOC launch with more comprehensive benchmarks and maybe architectural details so that we get an answer to some of these questions.

Lol. Glad it’s not just me that finds it confusing!

Jimmyjames · Feb 1, 2024

leman said:
I really don’t see 5.7TFLOPs FP32 on 750 happening, at least not in any way that’s meaningful. First, that number alone is ridiculously high. It would require 2048 FP32 pipelines running at 1.4GHz to get to that number. Are they claiming that 8gen3 has a GPU as an RX 7600? I mean, sure, if they use very wide SIMD and sacrifice precision, they could get there, but the shader utilization in real world work would be terrible. Look at GB compute results, they have difficulty competing with older Apple designs, does it look like a 5+ TFLOPs GPU to you?

Where Qualcomm has a big advantage is memory bandwidth. Then again, Apple probably has more cache.

I would normally have dismissed these claims as mistaken, or incorrectly interpreted, but some of the contributors to that thread are trustworthy. Longhorn in particular is really knowledgeable and has always been honest afaik.

https://Twitter or X not allowed/never_released/status/1752819352858984601

They seem very bullish on these gpus. There is clearly a disconnect I’m missing. It may be that they are limiting their enthusiasm to mobile games and your description would explain how they are getting those numbers.

dada_dave · Feb 1, 2024

Jimmyjames said:
Very interesting, thanks.

Lol. Glad it’s not just me that finds it confusing!

Also on the Genshin Impact video I believe they reported that Qualcomm GPU was experiencing higher utilization than the A17 GPU at obviously worse graphical settings, possibly a lower resolution but maybe not, and definitely lower frame rates. Of course that’s a single title. We know how difficult it can be to get an even playing field for CPUs (I still see people quoting CB23 numbers to compare M-series processors and x86), GPUs are even harder, and an individual game or benchmark can be very misleading. But even so … huh … according to the raw TFLOPs, the Adreno GPU is supposedly almost 3x as powerful … something doesn’t track.

Jimmyjames said:
I would normally have dismissed these claims as mistaken, or incorrectly interpreted, but some of the contributors to that thread are trustworthy. Longhorn in particular is really knowledgeable and has always been honest afaik.

https://Twitter or X not allowed/never_released/status/1752819352858984601

They seem very bullish on these gpus. There is clearly a disconnect I’m missing. It may be that they are limiting their enthusiasm to mobile games and your description would explain how they are getting those numbers.

Indeed. I dunno.

Jimmyjames · Feb 1, 2024

dada_dave said:
Also on the Genshin Impact video I believe they reported that Qualcomm GPU was experiencing higher utilization than the A17 GPU at obviously worse graphical settings, possibly a lower resolution but maybe not, and definitely lower frame rates. Of course that’s a single title. We know how difficult it can be to get an even playing field for CPUs (I still see people quoting CB23 numbers to compare M-series processors and x86), GPUs are even harder, and an individual game or benchmark can be very misleading. But even so … huh … according to the raw TFLOPs, the Adreno GPU is supposedly almost 3x as powerful … something doesn’t track.

Great points. It’s true that there are many variables. The person who posted that video will be posting more soon apparently. We’ll see how other games perform.

dada_dave said:
Indeed. I dunno.

This sums up my understanding!

dada_dave · Feb 1, 2024

Jimmyjames said:
Interesting tidbit. Apparently the Adreno 750, which is the Qualcomm 8 gen 3 gpu has peak ALU 5.7 TFLOPS at fp32. That seems high and probably higher than the A17. Utilization must be low given the results we’ve seen

https://Twitter or X not allowed/biovf/status/1752816050435928501

Edit: A17 is just over 2 Tflops. What’s going on here? Are Qualcomm cooking the books, or just unable to feed the gpu enough data, or something else?

Also the claim is that ARM’s Immortalis-G720 has even more TFLOPs at 5.9!

Surely we know more about that architecture as it’s ARM? What are its benchmarks like? I haven’t had to time to look yet myself.

Jimmyjames · Feb 1, 2024

dada_dave said:
Also the claim is that ARM’s Immortalis-G720 has even more TFLOPs at 5.9!

Surely we know more about that architecture as it’s ARM? What are its benchmarks like? I haven’t had to time to look yet myself.

It is (once again) weird.
GB scores are similar to the Adreno 750

Geekbench Search - Geekbench

Wildlife Extreme looks similar to the Adreno

Best Smartphones and Tablets - July 2025

See the best Smartphones and Tablets ranked by performance.

benchmarks.ul.com

GFXBench looks similar

They all look…similar. Are we sure the Adreno and the Immortalis aren’t the same? Lol.

dada_dave · Feb 1, 2024

So I was able to find some information on the Immortalis GPU 720 MC12- 12 cores, as the name implies, at 1.3Ghz which is definitely substantial. There are apparently 192 execution units per core which gives 2x192x12x1.3GHz = 5.99 TFLOPs. The math for the Immortalis GPU 720 MC11 works out similarly 2x192x11x0.85GHz = 3.6 TFLOPs.

ARM Immortalis-G720 MC12 vs ARM Immortalis-G715 MC11

Comparison between ARM Immortalis-G720 MC12 and ARM Immortalis-G715 MC11 with the specifications of the graphics cards, the number of execution units, shadi...

gadgetversus.com

There’s a small typo where the mc12 says 10 execution units when obviously it’s 12. So @leman they claim to indeed be that wide. However, this may be “peak” TFLOPs but in reality the clocks on a phone never reach it for very long? It’s actually operating well below that for most of its time. After all those are laptop specs, not phone specs. I bet you it doesn’t actually get that clock speed in practice.

Other info. Warp size is 16. The only weird part is that I came across this which claimed that FMA per core per clock for the g715/g720 was 256, but it should be 384 given the above. Unsure about the discrepancy.

https://armkeil.blob.core.windows.net/developer/Files/pdf/product-brief/arm-gpu-processor-comparison-table.pdf

Seen this repeated as well by an ARM engineer and that the number of FP32 units was actually 128 not 192. Again, not sure what to make of it unless I’m misunderstanding something.

Jimmyjames · Feb 1, 2024

dada_dave said:
So I was able to find some information on the Immortalis GPU 720 MC12- 12 cores, as the name implies, at 1.3Ghz which is definitely substantial. There are apparently 192 execution units per core which gives 2x192x12x1.3GHz = 5.99 TFLOPs. The math for the Immortalis GPU 720 MC11 works out similarly 2x192x11x0.85GHz = 3.6 TFLOPs.

ARM Immortalis-G720 MC12 vs ARM Immortalis-G715 MC11

Comparison between ARM Immortalis-G720 MC12 and ARM Immortalis-G715 MC11 with the specifications of the graphics cards, the number of execution units, shadi...

gadgetversus.com

There’s a small typo where the mc12 says 10 execution units when obviously it’s 12. So @leman they claim to indeed be that wide. However, this may be “peak” TFLOPs but in reality the clocks on a phone never reach it for very long? It’s actually operating well below that for most of its time. After all those are laptop specs, not phone specs. I bet you it doesn’t actually get that clock speed in practice.

Other info. Warp size is 16. The only weird part is that I came across this which claimed that FMA per core per clock for the g715/g720 was 256, but it should be 384 given the above. Unsure about the discrepancy.

https://armkeil.blob.core.windows.net/developer/Files/pdf/product-brief/arm-gpu-processor-comparison-table.pdf

Seen this repeated as well by an ARM engineer and that the number of FP32 units was actually 128 not 192. Again, not sure what to make of it unless I’m misunderstanding something.

Good sleuthing!

dada_dave · Feb 1, 2024

Jimmyjames said:
Good sleuthing!

Thanks. From what I can tell (and @leman can confirm or refute) the Apple A17 GPU cores are 1.4 GHz, 128 units, warp size 32. With 6 cores that’s 2.15TFLOPs FP32. Like the ARM it claims double the FP16 TFLOPs, unclear if the ARM cores have separate FP16 pipes though like Apple does or run x2 FP16 through the FP32 pipes. I’m also still confused by the 128 vs 192 discrepancy on the ARM data.

https://www.cpu-monkey.com/en/igpu-apple_a17_pro_6_gpu_cores

CPU monkey also has a typo, obviously the A17Pro has hardware ray tracing.

Edit: just noticed Qualcomm’s Adreno 750 TFLOPs listed here:

https://www.cpu-monkey.com/en/igpu-qualcomm_adreno_750

Vs

Qualcomm Adreno 750

Qualcomm Adreno 750 graphics card benchmarks and specs, with the number of execution units, shading units, cache, memory, the power consumption, the lithogr...

gadgetversus.com

Another variant? Which is in the actual phones?

Edit2: and here’s CPU monkey’s listing for the Immortalis

https://www.cpu-monkey.com/en/igpu-arm_immortalis_g720_mc12

Look at the clock speed and tflops

Jimmyjames · Feb 1, 2024

dada_dave said:
Thanks. From what I can tell (and @leman can confirm or refute) the Apple A17 GPU cores are 1.4 GHz, 128 units, warp size 32. With 6 cores that’s 2.15TFLOPs FP32. Like the ARM it claims double the FP16 TFLOPs, unclear if the ARM cores have separate FP16 pipes though like Apple does or run x2 FP16 through the FP32 pipes. I’m also still confused by the 128 vs 192 discrepancy on the ARM data.

https://www.cpu-monkey.com/en/igpu-apple_a17_pro_6_gpu_cores

CPU monkey also has a typo, obviously the A17Pro has hardware ray tracing.

Edit: just noticed Qualcomm’s Adreno 750 TFLOPs listed here:

https://www.cpu-monkey.com/en/igpu-qualcomm_adreno_750

Vs

Qualcomm Adreno 750

Qualcomm Adreno 750 graphics card benchmarks and specs, with the number of execution units, shading units, cache, memory, the power consumption, the lithogr...

gadgetversus.com

Another variant? Which is in the actual phones?

Edit2: and here’s CPU monkey’s listing for the Immortalis

https://www.cpu-monkey.com/en/igpu-arm_immortalis_g720_mc12

Look at the clock speed and tflops

Hmmm that is a discrepancy.

Geekerwan’s Review of the Snapdragon 8 Gen 3 (Xiaomi 14).

Site Champ

Site Champ

Site Champ

Elite Member

Elite Member

up

Elite Member

Site Champ

Site Champ

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Similar threads