Nuvia: don’t hold your breath

Talking from the perspective of a software engineer I wholeheartedly disagree with calling a test like this "concurrency" testing. Concurrency and parallelism are have very specific meanings, and single-threaded JavaScript can still be concurrent with use of continuations and futures. A single core machine can also operate concurrently - that's what timeslicing with pre-emptive multitasking does afterall. In the middle of operations, the state is saved, yanked to other "concurrent work" and can be put back in its old state later while both task perform, concurrently, on the same core. That is, concurrent in the sense that both are in progress at the same time, but not necessarily that process is being made simultaneously; That's what parallelism is. Simultaneous progress being made.
But the GB example is explicitly parallel to one program, and one process yes? There has to be a better way to distinguish what I’m saying surely (also you’re right about concurrent, it’s been a while since I thought about this as a developer but yes obviously context switching and concurrency are related)
This is all fair too. Biggest advantage I see to adding it n the overview of GB6 is that a lot of data gets collected for that. But if the data becomes some microbenchmark somewhere it might get harder and harder to find the info for a large suite of chips for architectural insights. But more of a hypothetical at this point
 
You should be able to parallelize linking each of the 3 architectures, but maybe I’m missing something?
Right, so yes. Those steps should be possible to parallelise, but gradle is weird about it. I can't really tell why but sometimes it does do them in parallel but most of the time it doesn't. Regardless, there's at least a sequential element to linking (per architecture) relative to compiling which is per compilation unit generally
 
But the GB example is explicitly parallel to one program, and one process yes? There has to be a better way to distinguish what I’m saying surely (also you’re right about concurrent, it’s been a while since I thought about this as a developer but yes obviously context switching and concurrency are related)
I mean one program in the sense that there's a single host program, sure but there's no reason the host program can't spawn a million instances of other programs and await their completion for result calculation or something. I would not really think there's a meaningful enough distinction between many processes and many threads outside of the required kernel work for process isolation anyway
 
Regardless, there's at least a sequential element to linking (per architecture)
There are recent developments which change this. A few years ago, one developer who was bothered by the slowness of linking large binaries decided to look into traditional UNIX style ld programs and see what he could do to optimize them. First he worked on LLVM's lld to add some parallelism to it, then he wrote his own new open source linker, mold.


He says it's not just parallelism, he also did a bunch of other optimization work that helps make it much faster than traditional linkers.

If link speed is limiting your productivity, you should evaluate whether you can switch to LLD or mold.
 
No, it’s still very real world, it’s just that massively parallel is the wrong word for it. More like concurrent, as I kind of tried to explain

I agree with and am well aware of the synchronization issues and why the current way GB6 runs it is fine, FWIW. I didn’t say I wanted to get rid of it. But A PC will have hundreds of processes running at any one time, it makes sense people often have multiple cores to take advantage of this for background cores and such more than direct MT workloads. The E Cores barely add anything to GB6 MT but they are crucial to MacOS as we enjoy our now especially wrt responsiveness.

MT workload (doing as they do now) + concurrent scaling (for multitasking) would be the right way to put this, I’d be fine with them doing both.

In truth the former is going to be more important in regard to how *fast* the CPU is — for sure. But as long as a that separation is made clear, a role for the multitasking model here is fine to me.

I agree that the ability of a system to multitask effectively is important too, and that it's not covered by any of the current Geekbench scores. But I think the background processes for a typical user are so lightweight (compared to the full processing powers of all cores combined) that any system with a good Geekbench multicore score is not going to have problems with multitasking. For example, the CPU utilization of my computer right now (writing this in Safari, with Xcode + Calendar + Mail + Terminal + Notes + Activity Monitor open) is below 10%.

Edit: @Andropov okay sure I see what you mean now wrt Cache sharing across concurrent loads, I’m not sure I really agree with that in principle given even phones were benefitting from massively concurrent cores in Android with Chrome in like, 2018 — and again that wasn’t anything to do with an MT coordination for performance, just multiple tabs pinned to cores and such. In principle you have a point though!

Maybe I’m wrong and there’s really no reason to measure concurrent perf, idk. Wondering what you think!
I've no clue about how Android or web browsers in general work so I can't really counter argue this.

This is fair. And if your tasks are that parallel in nature there also comes a point where GPGPU makes more sense anyway. With some exceptions of course
Yeah this was also something I was thinking about but ultimately I left it out to avoid making my post too long. If something really is massively parallel in nature, the GPU is likely going to be faster by several orders of magnitude.
 
Is there anything it can’t be configured with?

1717611255797.png
 
Yeah true. It might be tongue in cheek, tough to tell from text.
Oh it’s definitely tongue in cheek, or at least rascally. I mean it might even end up being them with the superior P-core by then, no one can predict these things even those in the middle of it, but he’s having fun. That’s how I read it.
 
ARM's aim is that in 5 years half of the new Windows PCs shall be ARM-based:

I could see them gain some traction with laptops (if Microsoft's new translator is fast enough), but gaming machines will most likely still run on x86-64.
Why gaming machines? Most of the heavy lifting for games happens in the GPU, which has its own arch. And Microsoft controls Direct X/3D, so no hoops need be through jumped there, a la Apple’s GPTK/wine.
 
Why gaming machines? Most of the heavy lifting for games happens in the GPU, which has its own arch. And Microsoft controls Direct X/3D, so no hoops need be through jumped there, a la Apple’s GPTK/wine.

Especially because of the GPU. I knew some gamers, who bought a new graphics card every half year.
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.
 
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.

... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.
 
Especially because of the GPU. I knew some gamers, who bought a new graphics card every half year.
Unless ARM/Qualcomm/Microsoft devise a standard efficient way to use external graphics cards, I bet "pro gamers" won't be too happy about an ARM SoC with integrated GPU.

... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.
While I think ARM’s CEO is being more than a tad optimistic in predicting 50% in 5 years, I don’t believe that the SOC design excludes gamers. Unlike Apple, Qualcomm has stated that they are interested in laptop design wins with dGPUs and I believe eventually desktops. If the rumors of MediaTek-Nvidia and AMD ARM-based SOCs pan out next year they too could have SOCs + dGPU. And of course it’s true that some gamers have fetishized the distinction between dGPU and iGPU (often ignoring that consoles have iGPUs) but we know that simply being integrated doesn’t make an iGPU less powerful and for laptops, the majority of even the gaming market, dGPUs are hardly ever replaced independently. Many could come around if the SOC designs are good.
 
While I think ARM’s CEO is being more than a tad optimistic in predicting 50% in 5 years, I don’t believe that the SOC design excludes gamers. Unlike Apple, Qualcomm has stated that they are interested in laptop design wins with dGPUs and I believe eventually desktops. If the rumors of MediaTek-Nvidia and AMD ARM-based SOCs pan out next year they too could have SOCs + dGPU. And of course it’s true that some gamers have fetishized the distinction between dGPU and iGPU (often ignoring that consoles have iGPUs) but we know that simply being integrated doesn’t make an iGPU less powerful and for laptops, the majority of even the gaming market, dGPUs are hardly ever replaced independently. Many could come around if the SOC designs are good.
Yeah; As long as the GPU driver has an ARM compatible version, should be possible to just slot into PCIe.
 
... Why would they need to devise something? PCIe already exists for internal expansion and a fair few laptops have shipped with Thunderbolt eGPUs both pure and wrapped in proprietary connectors.

Apparently, I'm to much used to embedded SoCs with limited interfaces. Qualcomm lists PCIe 4.0 for the Snapdragon X Elite (although only under "SSD/NVMe Interface").
But as you already said, the drivers still have to be ported, otherwise the best dGPU is just unutilized silicon.
 
Apparently, I'm to much used to embedded SoCs with limited interfaces. Qualcomm lists PCIe 4.0 for the Snapdragon X Elite (although only under "SSD/NVMe Interface").
But as you already said, the drivers still have to be ported, otherwise the best dGPU is just unutilized silicon.
Nvidia already has arm gpu driver to facilitate grace hopper and their other soc projects. And amd had a Radeon in a Samsung arm chip.
 
In response to who will have the IPC lead by the end of 2025.
View attachment 29800
Yeah that was awesome. It doesn’t matter if he’s wrong by a bit because of Apple’s own next boost by a bit (like another 5-6% integer) or because they miss by a tad, to me it just says they’re going to have a big IPC jump. It won’t matter for Apple tbc, nor am I trying to rub anything in here, it’s just good for QC and WoA and Android ecosystem, and bad for AMD/Intel.

I also think this was predictable, as I had said that Nuvia really had to go 0 to 1 from 2021 to 2023, basically start over on some stuff (like with fabric implementations for the QC PC part and all) and was also distracted by and held back by lawsuits. And yet their (late!) part is quite cost effective and is still at parity or ballpark similar with Intel/AMD competitors depending on what it is.

Over the same time AMD and Intel still didn’t match their core’s IPC as we now see, which is very impressive.

What I’m getting at though is that given their background and the circumstances of the last 2-3 years — I think it’s believable that Oryon V2 will be a significant performance and power jump. If it isn’t and is meh given the node and all, then that does say I’m wrong about how held back or hectic things were for the team, but this seems very unlikely
 
Back
Top