Nuvia: don’t hold your breath

This is, so far, pretty much a replay of what happened with their first attempt. They are using more silicon to compete (and win) on multicore against a smaller Apple chip.

The 12-core X2E has comparable multicore to the base M5 (17k GB6, ~1200 CB2024), and it's die size is also presumably similar to base M5.
Please don't (implicitly) misattribute quotes. Not that it was terribly important in this case.

To the point, you may be right, but that's not relevant to the marketing that we were talking about, where they were comparing the 18-core to the M5.
 
I know of no WoA native games. Potentially Minecraft? Certainly a tiny list.

Though at reasonable resolution and quality settings you'll be more GPU than CPU bound anyway and I don't think CPU translation should impact that very much. HLSL will still compile through the GPU driver to native GPU code in the end I assume
I dunno, for instance when CP 2077 became Mac native there were certainly resolutions and quality settings where the native and Xover versions had practically identical performance, but for most of the actually playable settings, there was quite a difference.


Some games may show less of a difference, others may show more, but there is a reason people generally prefer a (good) native port to Xover when available (the good being important) beyond convenience - Xover/Wine is great, don't get me wrong, but a full native port is going to be much more performant.

And even when the translation layer was just Rosetta 2, you could see performance differences, though admittedly smaller. Early translation layers for WoA were likewise blamed for poor gaming performance as well, though I believe Prism is thought to be much better. Yes, the GPU matters much more, but if the CPU is being hamstrung ... that can still matter.
 
they were comparing the 18-core to the M5

It would seem that Apple may be somewhat more aggressive with the memory bandwidth. If you have 18 Oryon cores (large, high throughput), you have to find a φ that keeps them from stalling out under load. Base φ is 4.0GHz, but once you get more than three or four going at once, the clock is going to drop off, just to keep the cores fed. If MC tests are using all those cores at once, it is difficult for me to imagine more than 1.5GHz. Heat is not really even the question, starving cores would just be silly.
 
It would seem that Apple may be somewhat more aggressive with the memory bandwidth. If you have 18 Oryon cores (large, high throughput), you have to find a φ that keeps them from stalling out under load. Base φ is 4.0GHz, but once you get more than three or four going at once, the clock is going to drop off, just to keep the cores fed. If MC tests are using all those cores at once, it is difficult for me to imagine more than 1.5GHz. Heat is not really even the question, starving cores would just be silly.
Not sure what you're trying to say. Total bandwidth is known. What's available to each core, and to each core cluster, is not yet known for the M5 Pro/Max (as far as I know, anyway); it's extra interesting in this generation because they are presumably all across the fusion bridge from the memory controllers.

But in any case, the number you're talking about is going to be totally dependent on cache hit rates, and different for every test. Many MC tests are not especially sensitive to memory bandwidth, while others are, extremely.
 
I dunno, for instance when CP 2077 became Mac native there were certainly resolutions and quality settings where the native and Xover versions had practically identical performance, but for most of the actually playable settings, there was quite a difference.


Some games may show less of a difference, others may show more, but there is a reason people generally prefer a (good) native port to Xover when available (the good being important) beyond convenience - Xover/Wine is great, don't get me wrong, but a full native port is going to be much more performant.

And even when the translation layer was just Rosetta 2, you could see performance differences, though admittedly smaller. Early translation layers for WoA were likewise blamed for poor gaming performance as well, though I believe Prism is thought to be much better. Yes, the GPU matters much more, but if the CPU is being hamstrung ... that can still matter.
But that also has to go through gpu translation of the shaders. WoA is directx and hlsl all the way.

Although a lot can also be done to optimize for tbdr and gpu architectures that may also be relevant for snapdragon gpu
 
But that also has to go through gpu translation of the shaders. WoA is directx and hlsl all the way.
Yes, I had an entire second paragraph that addressed that even just WoA translation can cause performance issues. To expand on what I wrote, when it was just WoA pre-Prism translation was poor enough that Macs running games through Xover did better. Prism has improved things of course, but even when all that was required was Rosetta 2, you can still see the impact. CPU translation takes a hit. How much is game dependent and how much translation is required, this latter bit also being my main point about the comparison between the platforms to begin with. To sum up:

1. Yes, Asus is of course going to choose examples that benefit it. It's advertising. They aren't required to make it fair, though particular egregious examples abound which I do believe cross ethical lines (some of the Intel crap when the M1 was first released comes to mind). Reporting Diablo 3 scores doesn't cross that line because of #3.

2. macOS likely has many more native games than WoA
a) native games are going to, on average, be more performant than translated games, even when it is just translating x86 to ARM and nothing else. While I always look in askance of CPU makers advertising a 5% improvement in gaming over their competitor as some great win, taking a 25% or more hit in ST performance is going to matter for frame rates and the quality of graphics you can set to get those acceptable frame rates, some games more than others of course. And of course some ports are bad enough that the translated game is actually faster. Unfortunate, but it does happen.

3. the majority of games will require translation layers for both and WoA will have an advantage requiring fewer translation layers, but I think it is fair to also point out that there is a difference in workload even when the average consumer doesn't care.

For instance, I do not include Qualcomm chips on my analysis of GPUs because I'm pretty sure they're not as inefficient as is shown because I'm pretty sure the NBC data comes from release pre-Prism:


And even post-Prism for Elite 2 GPUs I still might not include them because they'll be at a disadvantage that no other GPU is at (unless they are really good despite that disadvantage, then I'll admit that's worth noting). Now, I'm attempting to do hardware analysis, not advertising nor even "here's what the average consumer should expect" analysis. So I fully recognize that's different, but that's where I'm coming from.

Although a lot can also be done to optimize for tbdr and gpu architectures that may also be relevant for snapdragon gpu

Likely.
 
Last edited:
Yes, I had an entire second paragraph that addressed that even just WoA translation can cause performance issues. To expand on what I wrote, when it was just WoA pre-Prism translation was poor enough that Macs running games through Xover did better. Prism has improved things of course, but even when all that was required was Rosetta 2, you can still see the impact. CPU translation takes a hit. How much is game dependent and how much translation is required, this latter bit also being my main point about the comparison between the platforms to begin with. To sum up:

1. Yes, Asus is of course going to choose examples that benefit it. It's advertising. They aren't required to make it fair, though particular egregious examples abound which I do believe cross ethical lines (some of the Intel crap when the M1 was first released comes to mind). Reporting Diablo 3 scores doesn't cross that line because of #3.

2. macOS likely has many more native games than WoA
a) native games are going to, on average, be more performant than translated games, even when it is just translating x86 to ARM and nothing else. While I always look in askance of CPU makers advertising a 5% improvement in gaming over their competitor as some great win, taking a 25% or more hit in ST performance is going to matter for frame rates and the quality of graphics you can set to get those acceptable frame rates, some games more than others of course.

3. the majority of games will require translation layers for both and WoA will have an advantage requiring fewer translation layers, but I think it is fair to also point out that there is a difference in workload even when the average consumer doesn't care.

For instance, I do not include Qualcomm chips on my analysis of GPUs because I'm pretty sure they're not as inefficient as is shown because I'm pretty sure the NBC data comes from release pre-Prism:


And even post-Prism for Elite 2 GPUs I still might not include them because they'll be at a disadvantage that no other GPU is at (unless they are really good despite that disadvantage, then I'll admit that's worth noting). Now, I'm attempting to do hardware analysis, not advertising nor even "here's what the average consumer should expect" analysis. So I fully recognize that's different, but that's where I'm coming from.



Likely.

I fully agree with all of this, and I believe we've collectively (in this thread as a whole) captured a lot of the nuance at play here :)
 
In case anyone has missed it: While they're not playing in Apple's market the way QC wants to, the new nVidia ARM chips are looking pretty interesting. They appear to be completely custom designs with a 10-wide decoder and correspondingly large back end. The only version being made is 88 cores, so this is definitely a server offering only.
 
In case anyone has missed it: While they're not playing in Apple's market the way QC wants to, the new nVidia ARM chips are looking pretty interesting. They appear to be completely custom designs with a 10-wide decoder and correspondingly large back end. The only version being made is 88 cores, so this is definitely a server offering only.


Nvidia is of course set to release consumer hardware with MediaTek based on off the shelf ARM cores and so it wouldn’t surprise me, if that venture shows promise, if Nvidia were to eventually release a consumer chip with the Olympus cores (or a derivative/descendant). Worth keeping an eye on for sure.

Which in turn is why, as we learn more about Olympus’s architecture, that NVIDIA’s implementation of simultaneous multithreading on Olympus does not just schedule multiple threads on a single CPU core. Rather, it fully partitions the CPU core. Dubbed spatial multithreading, NVIDIA foregoes the timesharing nature of traditional SMT in favor of giving each thread a fixed and reduced set of resources. The resulting trade-off for system operators, then, is whether to allow more threads at reduced throughput (but perhaps better overall utilization of the hardware) or fewer threads moving through the Olympus cores as fast as the hardware can take it.

This sounds extremely similar to Intel's (possibly abandoned) Rentable Unit idea from their (probably abandoned) Royal Core project spearheaded by Jim Keller.

EDIT: Huh ... I just realized STH is where Ryan Smith from Anandtech ended up. I was wondering where he went.
 
Last edited:
This sounds extremely similar to Intel's (possibly abandoned) Rentable Unit idea
Really? Because it sounded to me like they didn't want to work too hard at SMT so they did a minimal version of it.

I wouldn't expect Olympus itself in any consumer gear, but a descendant? Definitely.

I just realized STH is where Ryan Smith from Anandtech ended up. I was wondering where he went.
Yes, that's pretty new. And Shilov is at TH. It's funny, he started out really clueless. Now he's the gold standard and head and shoulders above the other reporters there.
 
Really? Because it sounded to me like they didn't want to work too hard at SMT so they did a minimal version of it.

I wouldn't expect Olympus itself in any consumer gear, but a descendant? Definitely.


Yes, that's pretty new. And Shilov is at TH. It's funny, he started out really clueless. Now he's the gold standard and head and shoulders above the other reporters there.

Aye I knew about Anton at TH, but I don’t read STH much ( not a comment on their quality, just not my focus, but maybe I should) so I missed Ryan joining it looks like January last year? Andrei of course went to Qualcomm and Ian has his own channel/business (and is a part of chipsandcheese). There were some other hardware guys doing things like hard drives and other stuff at AT, hopefully they also landed on their feet at other places. That loss still sucks. :(

As for the “RU” vs “spatial SMT”, yeah I mean I’m sure there are differences, but from a top level it sounds similar to me anyway. But maybe I’m way off base. I know @Cmaier wasn’t terribly impressed by the RU idea.
 
Back
Top