Nuvia: don’t hold your breath

Artemis · May 25, 2024

casperes1996 said:
Talking from the perspective of a software engineer I wholeheartedly disagree with calling a test like this "concurrency" testing. Concurrency and parallelism are have very specific meanings, and single-threaded JavaScript can still be concurrent with use of continuations and futures. A single core machine can also operate concurrently - that's what timeslicing with pre-emptive multitasking does afterall. In the middle of operations, the state is saved, yanked to other "concurrent work" and can be put back in its old state later while both task perform, concurrently, on the same core. That is, concurrent in the sense that both are in progress at the same time, but not necessarily that process is being made simultaneously; That's what parallelism is. Simultaneous progress being made.

But the GB example is explicitly parallel to one program, and one process yes? There has to be a better way to distinguish what I’m saying surely (also you’re right about concurrent, it’s been a while since I thought about this as a developer but yes obviously context switching and concurrency are related)

casperes1996 said:
This is all fair too. Biggest advantage I see to adding it n the overview of GB6 is that a lot of data gets collected for that. But if the data becomes some microbenchmark somewhere it might get harder and harder to find the info for a large suite of chips for architectural insights. But more of a hypothetical at this point

casperes1996 · May 25, 2024

Altaic said:
You should be able to parallelize linking each of the 3 architectures, but maybe I’m missing something?

Right, so yes. Those steps should be possible to parallelise, but gradle is weird about it. I can't really tell why but sometimes it does do them in parallel but most of the time it doesn't. Regardless, there's at least a sequential element to linking (per architecture) relative to compiling which is per compilation unit generally

casperes1996 · May 25, 2024

Artemis said:
But the GB example is explicitly parallel to one program, and one process yes? There has to be a better way to distinguish what I’m saying surely (also you’re right about concurrent, it’s been a while since I thought about this as a developer but yes obviously context switching and concurrency are related)

I mean one program in the sense that there's a single host program, sure but there's no reason the host program can't spawn a million instances of other programs and await their completion for result calculation or something. I would not really think there's a meaningful enough distinction between many processes and many threads outside of the required kernel work for process isolation anyway

mr_roboto · May 25, 2024

casperes1996 said:
Regardless, there's at least a sequential element to linking (per architecture)

There are recent developments which change this. A few years ago, one developer who was bothered by the slowness of linking large binaries decided to look into traditional UNIX style ld programs and see what he could do to optimize them. First he worked on LLVM's lld to add some parallelism to it, then he wrote his own new open source linker, mold.

GitHub - rui314/mold: Mold: A Modern Linker 🦠

Mold: A Modern Linker 🦠. Contribute to rui314/mold development by creating an account on GitHub.

github.com

He says it's not just parallelism, he also did a bunch of other optimization work that helps make it much faster than traditional linkers.

If link speed is limiting your productivity, you should evaluate whether you can switch to LLD or mold.

Andropov · May 26, 2024

Artemis said:
No, it’s still very real world, it’s just that massively parallel is the wrong word for it. More like concurrent, as I kind of tried to explain

I agree with and am well aware of the synchronization issues and why the current way GB6 runs it is fine, FWIW. I didn’t say I wanted to get rid of it. But A PC will have hundreds of processes running at any one time, it makes sense people often have multiple cores to take advantage of this for background cores and such more than direct MT workloads. The E Cores barely add anything to GB6 MT but they are crucial to MacOS as we enjoy our now especially wrt responsiveness.

MT workload (doing as they do now) + concurrent scaling (for multitasking) would be the right way to put this, I’d be fine with them doing both.

In truth the former is going to be more important in regard to how *fast* the CPU is — for sure. But as long as a that separation is made clear, a role for the multitasking model here is fine to me.

I agree that the ability of a system to multitask effectively is important too, and that it's not covered by any of the current Geekbench scores. But I think the background processes for a typical user are so lightweight (compared to the full processing powers of all cores combined) that any system with a good Geekbench multicore score is not going to have problems with multitasking. For example, the CPU utilization of my computer right now (writing this in Safari, with Xcode + Calendar + Mail + Terminal + Notes + Activity Monitor open) is below 10%.

Artemis said:
Edit: @Andropov okay sure I see what you mean now wrt Cache sharing across concurrent loads, I’m not sure I really agree with that in principle given even phones were benefitting from massively concurrent cores in Android with Chrome in like, 2018 — and again that wasn’t anything to do with an MT coordination for performance, just multiple tabs pinned to cores and such. In principle you have a point though!

Maybe I’m wrong and there’s really no reason to measure concurrent perf, idk. Wondering what you think!

I've no clue about how Android or web browsers in general work so I can't really counter argue this.

casperes1996 said:
This is fair. And if your tasks are that parallel in nature there also comes a point where GPGPU makes more sense anyway. With some exceptions of course

Yeah this was also something I was thinking about but ultimately I left it out to avoid making my post too long. If something really is massively parallel in nature, the GPU is likely going to be faster by several orders of magnitude.

Jimmyjames · Jun 5, 2024

Is there anything it can’t be configured with?

https://Twitter or X not allowed/aschilling/status/1798398262103437762

Jimmyjames · Jun 5, 2024

In response to who will have the IPC lead by the end of 2025.

dada_dave · Jun 5, 2024

Jimmyjames said:
In response to who will have the IPC lead by the end of 2025.
View attachment 29800

Gauntlet thrown … though I’ll repeat my previous statement: IPC leader but in which workload? When it’s close that matters …

Jimmyjames · Jun 5, 2024

dada_dave said:
Gauntlet thrown … though I’ll repeat my previous statement: IPC leader but in which workload? When it’s close that matters …

Yeah true. It might be tongue in cheek, tough to tell from text.

dada_dave · Jun 5, 2024

Jimmyjames said:
Yeah true. It might be tongue in cheek, tough to tell from text.

Oh it’s definitely tongue in cheek, or at least rascally. I mean it might even end up being them with the superior P-core by then, no one can predict these things even those in the middle of it, but he’s having fun. That’s how I read it.

Altaic · Jun 5, 2024

dada_dave said:
Gauntlet thrown … though I’ll repeat my previous statement: IPC leader but in which workload? When it’s close that matters …

Simulating Intel chips?

KingOfPain · Jun 5, 2024

ARM's aim is that in 5 years half of the new Windows PCs shall be ARM-based:

https://www.reuters.com/technology/arm-aims-capture-50-pc-market-five-years-ceo-says-2024-06-03/

I could see them gain some traction with laptops (if Microsoft's new translator is fast enough), but gaming machines will most likely still run on x86-64.

Altaic · Jun 5, 2024

KingOfPain said:
ARM's aim is that in 5 years half of the new Windows PCs shall be ARM-based:

https://www.reuters.com/technology/arm-aims-capture-50-pc-market-five-years-ceo-says-2024-06-03/

I could see them gain some traction with laptops (if Microsoft's new translator is fast enough), but gaming machines will most likely still run on x86-64.

Why gaming machines? Most of the heavy lifting for games happens in the GPU, which has its own arch. And Microsoft controls Direct X/3D, so no hoops need be through jumped there, a la Apple’s GPTK/wine.

KingOfPain · Jun 6, 2024

Altaic said:
Why gaming machines? Most of the heavy lifting for games happens in the GPU, which has its own arch. And Microsoft controls Direct X/3D, so no hoops need be through jumped there, a la Apple’s GPTK/wine.

Especially because of the GPU. I knew some gamers, who bought a new graphics card every half year.
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.

casperes1996 · Jun 6, 2024

KingOfPain said:
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.

... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.

dada_dave · Jun 6, 2024

KingOfPain said:
Especially because of the GPU. I knew some gamers, who bought a new graphics card every half year.
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.

casperes1996 said:
... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.

While I think ARM’s CEO is being more than a tad optimistic in predicting 50% in 5 years, I don’t believe that the SOC design excludes gamers. Unlike Apple, Qualcomm has stated that they are interested in laptop design wins with dGPUs and I believe eventually desktops. If the rumors of MediaTek-Nvidia and AMD ARM-based SOCs pan out next year they too could have SOCs + dGPU. And of course it’s true that some gamers have fetishized the distinction between dGPU and iGPU (often ignoring that consoles have iGPUs) but we know that simply being integrated doesn’t make an iGPU less powerful and for laptops, the majority of even the gaming market, dGPUs are hardly ever replaced independently. Many could come around if the SOC designs are good.

casperes1996 · Jun 6, 2024

dada_dave said:
While I think ARM’s CEO is being more than a tad optimistic in predicting 50% in 5 years, I don’t believe that the SOC design excludes gamers. Unlike Apple, Qualcomm has stated that they are interested in laptop design wins with dGPUs and I believe eventually desktops. If the rumors of MediaTek-Nvidia and AMD ARM-based SOCs pan out next year they too could have SOCs + dGPU. And of course it’s true that some gamers have fetishized the distinction between dGPU and iGPU (often ignoring that consoles have iGPUs) but we know that simply being integrated doesn’t make an iGPU less powerful and for laptops, the majority of even the gaming market, dGPUs are hardly ever replaced independently. Many could come around if the SOC designs are good.

Yeah; As long as the GPU driver has an ARM compatible version, should be possible to just slot into PCIe.

KingOfPain · Jun 6, 2024

casperes1996 said:
... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.

Apparently, I'm to much used to embedded SoCs with limited interfaces. Qualcomm lists PCIe 4.0 for the Snapdragon X Elite (although only under "SSD/NVMe Interface").
But as you already said, the drivers still have to be ported, otherwise the best dGPU is just unutilized silicon.

casperes1996 · Jun 6, 2024

KingOfPain said:
Apparently, I'm to much used to embedded SoCs with limited interfaces. Qualcomm lists PCIe 4.0 for the Snapdragon X Elite (although only under "SSD/NVMe Interface").
But as you already said, the drivers still have to be ported, otherwise the best dGPU is just unutilized silicon.

Nvidia already has arm gpu driver to facilitate grace hopper and their other soc projects. And amd had a Radeon in a Samsung arm chip.

Artemis · Jun 6, 2024

Jimmyjames said:
In response to who will have the IPC lead by the end of 2025.
View attachment 29800

Yeah that was awesome. It doesn’t matter if he’s wrong by a bit because of Apple’s own next boost by a bit (like another 5-6% integer) or because they miss by a tad, to me it just says they’re going to have a big IPC jump. It won’t matter for Apple tbc, nor am I trying to rub anything in here, it’s just good for QC and WoA and Android ecosystem, and bad for AMD/Intel.

I also think this was predictable, as I had said that Nuvia really had to go 0 to 1 from 2021 to 2023, basically start over on some stuff (like with fabric implementations for the QC PC part and all) and was also distracted by and held back by lawsuits. And yet their (late!) part is quite cost effective and is still at parity or ballpark similar with Intel/AMD competitors depending on what it is.

Over the same time AMD and Intel still didn’t match their core’s IPC as we now see, which is very impressive.

What I’m getting at though is that given their background and the circumstances of the last 2-3 years — I think it’s believable that Oryon V2 will be a significant performance and power jump. If it isn’t and is meh given the node and all, then that does say I’m wrong about how held back or hectic things were for the team, but this seems very unlikely

Nuvia: don’t hold your breath

Site Champ

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Site Champ

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Site Champ

Site Champ

Site Champ

Site Champ