Nuvia: don’t hold your breath

I don't think their current offering is competitive enough for the Windows market to massively switch to ARM. Unless I'm missing something, the Snapdragon 888 (released late 2020) scores just 966 points in Geekbench 5 single core and 3044 in multicore... slower than Apple's A12 or Intel's i3-8200U.

At least they use a lot less power. But it's going to be difficult to pull off a 70% increase in performance in 2-3 years (to match current M1 performance).
 
I don't think their current offering is competitive enough for the Windows market to massively switch to ARM. Unless I'm missing something, the Snapdragon 888 (released late 2020) scores just 966 points in Geekbench 5 single core and 3044 in multicore... slower than Apple's A12 or Intel's i3-8200U.

At least they use a lot less power. But it's going to be difficult to pull off a 70% increase in performance in 2-3 years (to match current M1 performance).
Well, I have to assume they aren’t basing this new thing in any way on their existing designs. It’s probably going to have a microarchitecture similar to M1, with wide issue.

Questionable what they do for GPU, of course. And almost certainly not a unified memory architecture.

From their perspective, if you expect to just be supplying one SoC and there is going to be a separate GPU, RAM, etc., you can crank up the clock speeds well past what M1 does and trade off some power consumption to try and make up for any microarchitecture deficiencies.
 
Advanced Micro Devices has built ARM devices in the past, have they not? What is the possibility that they might try playing both sides of the fence?
 
Advanced Micro Devices has built ARM devices in the past, have they not? What is the possibility that they might try playing both sides of the fence?

I know we had an architecture license, but we never used it when I was there. I think maybe after I left there was something?
 
I paid a brief visit to GB. Their scoring methodology seems less than ideal. I see Snapdragon 8 Gen 1 scoring as high as 1161 SC on Android (about the same as a Xeon W) which is way behind M1 and the MC scores are not worth mentioning. The Gen 2 scores under 800, but that is running Windows. It looks like a non-small part of M1's performance lead is the OS itself, along with the features Apple has embedded to optimize it for macOS. The OS itself seems to be a significant factor in performance, which is a bit of a problem.

I imagine that Qualcomm is planning wide pipes with a massive ROB – basically just copying what has worked for Apple. Their claims as of now are based on chalkboard estimates. There is almost certainly some unpublished magic Apple is using that others will struggle to divine.
 
It looks like a non-small part of M1's performance lead is the OS itself, along with the features Apple has embedded to optimize it for macOS. The OS itself seems to be a significant factor in performance, which is a bit of a problem.
@Cmaier has mentioned how his team at AMD "worked closely with" Microsoft to implement x86-64 in Windows, but Apple's integration between macOS and the M-series must be on another level. That's probably a competitive advantage that is impossible to benchmark, not just in terms of raw performance, but the user experience. Since Qualcomm appears to be Microsoft's chosen partner for Windows-on-ARM, they are likely working closely with them on implementing Windows on these new Nuvia chips. However, they did the same with Intel for Alder Lake, and the Windows 11 scheduler has shown to have little to no improvement when working with Intel's 12-gen Core series.

Also, there has been a lot of speculation about whether these quick synthetic benchmarks are really taking advantage of everything Apple Silicon has to offer. It takes time to properly benchmark new hardware, particularly if it is outside the Windows hegemony. Craig Hunter has just released an informative review which shows the M1 Ultra putting the Intel Mac Pro to shame using a fluid dynamics benchmark. The Ultra's scaling is practically a straight line upward, while a 28-core Xeon quickly tapers off in efficiency. I particularly like how he describes the Mac Studio Ultra as an 8x8x4" supercomputer, and he points out that we still have the Apple Silicon Mac Pro on the horizon, so this is just the tip of the iceberg.
 
@Cmaier has mentioned how his team at AMD "worked closely with" Microsoft to implement x86-64 in Windows, but Apple's integration between macOS and the M-series must be on another level. That's probably a competitive advantage that is impossible to benchmark, not just in terms of raw performance, but the user experience. Since Qualcomm appears to be Microsoft's chosen partner for Windows-on-ARM, they are likely working closely with them on implementing Windows on these new Nuvia chips. However, they did the same with Intel for Alder Lake, and the Windows 11 scheduler has shown to have little to no improvement when working with Intel's 12-gen Core series.

Also, there has been a lot of speculation about whether these quick synthetic benchmarks are really taking advantage of everything Apple Silicon has to offer. It takes time to properly benchmark new hardware, particularly if it is outside the Windows hegemony. Craig Hunter has just released an informative review which shows the M1 Ultra putting the Intel Mac Pro to shame using a fluid dynamics benchmark. The Ultra's scaling is practically a straight line upward, while a 28-core Xeon quickly tapers off in efficiency. I particularly like how he describes the Mac Studio Ultra as an 8x8x4" supercomputer, and he points out that we still have the Apple Silicon Mac Pro on the horizon, so this is just the tip of the iceberg.
Hunters graphs are shocking. As he notes, imagine what the Mac Pro chip will do.
 
@Cmaier has mentioned how his team at AMD "worked closely with" Microsoft to implement x86-64 in Windows, but Apple's integration between macOS and the M-series must be on another level. That's probably a competitive advantage that is impossible to benchmark, not just in terms of raw performance, but the user experience. Since Qualcomm appears to be Microsoft's chosen partner for Windows-on-ARM, they are likely working closely with them on implementing Windows on these new Nuvia chips. However, they did the same with Intel for Alder Lake, and the Windows 11 scheduler has shown to have little to no improvement when working with Intel's 12-gen Core series.

Also, there has been a lot of speculation about whether these quick synthetic benchmarks are really taking advantage of everything Apple Silicon has to offer. It takes time to properly benchmark new hardware, particularly if it is outside the Windows hegemony. Craig Hunter has just released an informative review which shows the M1 Ultra putting the Intel Mac Pro to shame using a fluid dynamics benchmark. The Ultra's scaling is practically a straight line upward, while a 28-core Xeon quickly tapers off in efficiency. I particularly like how he describes the Mac Studio Ultra as an 8x8x4" supercomputer, and he points out that we still have the Apple Silicon Mac Pro on the horizon, so this is just the tip of the iceberg.
1651234022015.png

Wow. Intel hasn’t just been passed. They have been lapped.
 
Also, there has been a lot of speculation about whether these quick synthetic benchmarks are really taking advantage of everything Apple Silicon has to offer. It takes time to properly benchmark new hardware, particularly if it is outside the Windows hegemony. Craig Hunter has just released an informative review which shows the M1 Ultra putting the Intel Mac Pro to shame using a fluid dynamics benchmark. The Ultra's scaling is practically a straight line upward, while a 28-core Xeon quickly tapers off in efficiency. I particularly like how he describes the Mac Studio Ultra as an 8x8x4" supercomputer, and he points out that we still have the Apple Silicon Mac Pro on the horizon, so this is just the tip of the iceberg.
Whoa! That was a nice read. Much more substantial improvement than most other benchmarks I've seen, IIRC. I wonder how many real life tasks would get the same benefits. Just memory bound ones?
 
Hunters graphs are shocking. As he notes, imagine what the Mac Pro chip will do.
Please correct me if I am wrong, but I believe your latest prediction for the Apple Silicon Mac Pro is for an M2 "Extreme" with up to 1.0TB of unified memory? Basically, four M2 Max dies linked using next generation UltraFusion interconnects? While I'm no CPU architect, that sounds like a reasonable assumption and would make the Mac Pro a powerful, capable, scalable machine. If it follows the same trend as the M1 series, then it should easily be the most efficient workstation in existence, assuming it is released within the next year or so.

Now, that hasn't stopped some people from fantasy designing their own SoC. Evidently, everyone is a CPU architect these days. This isn't just from MR, but folks who should know better, such as over at the Ars Technica forums. One common solution that I have heard, for matching the 1.5TB maximum system memory of the current Intel version, is that Apple will start using external DIMMs, just for the Mac Pro, no other Macs. I've even heard some insist that Apple will resort to implementing HBM2 in such a solution. I've also seen many insisting that, since AMD will be announcing the RX 7000 series with RDNA3 later this year, that the Mac Pro will feature the return of discrete graphics. (That would also be in direct opposition to another @Cmaier prediction that the M2 would perhaps feature ray tracing.) That theory is that Apple will use MPX modules to support upgradeability for such features. There's also a persistent rumor of one last Intel version with an Ice Lake Xeon, secretly wandering Cupertino's hallways like a forlorn x86 revenant.

Of course, at the top of the wish list is always the return of Boot Camp support for Windows, assuming Microsoft doesn't renew its ARM exclusivity contract with Qualcomm. This is despite Craig Federighi* specifically stating that Apple won't support direct booting of other operating systems, and that their solution is virtual machines. Sure, Apple has made a few unnoficial accommodations for Asahi Linux during the boot process, but those were minor tweaks to make it easier for that project. Apple's Rosetta engineers have helped CodeWeavers support 32-bit programs with CrossOver on Apple Silicon Macs. However, these are implementations that don't involve a shift in strategy or substantial engineering resources. Every indication suggests that VMs and WINE are considered satisfactory solutions, from Apple's standpoint.

(*Craig said that in an interview with Gruber that native Windows ain't happening. I timestamped the exact quote because I constantly hear about the inevitable return of Boot Camp. Even then, some folks refuse to believe, despite them hearing it straight from Apple's senior vice president of software engineering, who is literally the decision maker for such things.)

None of that matches Apple's strategy thus far, in fact all public indications appear to be the opposite, but I suppose hope springs eternal. I think the Apple Silicon Mac Pro is the last hope for the return of these features, so a lot of people are projecting their personal desires onto it, which would then theoretically spread to the other models. The Mac Pro is the pinnacle of Apple's Mac line, so it is the penultimate symbol for a personal wish list. I'm not a CPU designer, but @Cmaier is, so I'm wondering if he sees any logical reason for Apple drastically altering its designs to accommodate any of these features, which appear entirely regressive, from my perspective? Perhaps there is something that I am missing in this debate, and the Apple Silicon Mac Pro will be more exotic than I am picturing?

I realize that, five years from now, people will still be asking for eGPU support, Boot Camp, easy internal upgrades, and a free pony, but it's best to dispel such notions whenever possible. What these people desire already exists. It's called a PC.
 
Please correct me if I am wrong, but I believe your latest prediction for the Apple Silicon Mac Pro is for an M2 "Extreme" with up to 1.0TB of unified memory? Basically, four M2 Max dies linked using next generation UltraFusion interconnects? While I'm no CPU architect, that sounds like a reasonable assumption and would make the Mac Pro a powerful, capable, scalable machine. If it follows the same trend as the M1 series, then it should easily be the most efficient workstation in existence, assuming it is released within the next year or so.

Now, that hasn't stopped some people from fantasy designing their own SoC. Evidently, everyone is a CPU architect these days. This isn't just from MR, but folks who should know better, such as over at the Ars Technica forums. One common solution that I have heard, for matching the 1.5TB maximum system memory of the current Intel version, is that Apple will start using external DIMMs, just for the Mac Pro, no other Macs. I've even heard some insist that Apple will resort to implementing HBM2 in such a solution. I've also seen many insisting that, since AMD will be announcing the RX 7000 series with RDNA3 later this year, that the Mac Pro will feature the return of discrete graphics. (That would also be in direct opposition to another @Cmaier prediction that the M2 would perhaps feature ray tracing.) That theory is that Apple will use MPX modules to support upgradeability for such features. There's also a persistent rumor of one last Intel version with an Ice Lake Xeon, secretly wandering Cupertino's hallways like a forlorn x86 revenant.

Of course, at the top of the wish list is always the return of Boot Camp support for Windows, assuming Microsoft doesn't renew its ARM exclusivity contract with Qualcomm. This is despite Craig Federighi* specifically stating that Apple won't support direct booting of other operating systems, and that their solution is virtual machines. Sure, Apple has made a few unnoficial accommodations for Asahi Linux during the boot process, but those were minor tweaks to make it easier for that project. Apple's Rosetta engineers have helped CodeWeavers support 32-bit programs with CrossOver on Apple Silicon Macs. However, these are implementations that don't involve a shift in strategy or substantial engineering resources. Every indication suggests that VMs and WINE are considered satisfactory solutions, from Apple's standpoint.

(*Craig said that in an interview with Gruber that native Windows ain't happening. I timestamped the exact quote because I constantly hear about the inevitable return of Boot Camp. Even then, some folks refuse to believe, despite them hearing it straight from Apple's senior vice president of software engineering, who is literally the decision maker for such things.)

None of that matches Apple's strategy thus far, in fact all public indications appear to be the opposite, but I suppose hope springs eternal. I think the Apple Silicon Mac Pro is the last hope for the return of these features, so a lot of people are projecting their personal desires onto it, which would then theoretically spread to the other models. The Mac Pro is the pinnacle of Apple's Mac line, so it is the penultimate symbol for a personal wish list. I'm not a CPU designer, but @Cmaier is, so I'm wondering if he sees any logical reason for Apple drastically altering its designs to accommodate any of these features, which appear entirely regressive, from my perspective? Perhaps there is something that I am missing in this debate, and the Apple Silicon Mac Pro will be more exotic than I am picturing?

I realize that, five years from now, people will still be asking for eGPU support, Boot Camp, easy internal upgrades, and a free pony, but it's best to dispel such notions whenever possible. What these people desire already exists. It's called a PC.
I agree with all of this.

It’s possible that apple allows slotted ram and puts its own gpu on a separate die, sure. But if it does that it will still be a shared memory architecture. I would say there’s a 1 percent chance of slotted RAM. An independent GPU is more likely; the technical issues with that are not very big, but the economics don’t make much sense given apple’s strategy of leveraging its silicon across all products. Still, I’d give that a 33 percent chance. And it wouldn’t be a plug in card or anything - just a separate GPU die in the package using something like fusion interconnect. Maybe for iMac Pro, Mac studio and Mac Pro.
 
An independent GPU is more likely; the technical issues with that are not very big, but the economics don’t make much sense given apple’s strategy of leveraging its silicon across all products. Still, I’d give that a 33 percent chance.
Thanks for the answer. In terms of a GPU, back in 2020 there was a report from the China Times about Apple making a GPU codenamed "Lifuka". We really haven't heard anything about it since then. Whether that was referring to the internal GPU, or a discrete design, still isn't clear. If Apple does implement an independent GPU, then would it be more likely for them to use their own design over a third-party, such as AMD? Assuming it was true to begin with, because the rumor claimed it was being designed for an iMac.
 
Thanks for the answer. In terms of a GPU, back in 2020 there was a report from the China Times about Apple making a GPU codenamed "Lifuka". We really haven't heard anything about it since then. Whether that was referring to the internal GPU, or a discrete design, still isn't clear. If Apple does implement an independent GPU, then would it be more likely for them to use their own design over a third-party, such as AMD? Assuming it was true to begin with, because the rumor claimed it was being designed for an iMac.
Yeah, definitely their own design. I’m quite convinced they like their architecture, and that they have been working on ray tracing. Given how parallelizable GPU stuff is, it’s quite possible that they simply put together a die that is just made up of a ton of the same GPU cores they have on their SoCs. You could imagine that, for modular high end machines, instead of partitioning die like: [CPU cores+GPU cores][CPU cores+GPU cores]… it may make more economic sense to do [CPU cores][CPU cores]…[GPU cores][GPU cores]…. (Or, even, [CPU cores+GPU cores][CPU cores+GPU cores]…[GPU cores]…


It may also make more engineering sense, in terms of latencies, power supply, and cooling, too. Of course, Apple wouldn’t do that if it was only for Mac Pro (probably) because the economies of scale wouldn’t work (plus, now, supply chains are fragile). They might do it if it made sense to use this type of partitioning for iMacs, iMac Pros, Studios, Mac Pros, and maybe high end MacBook Pros, while using the current partitioning for iPads, iPhone Pros (maybe), Mac Minis, MacBook Pros, MacBooks, and maybe low end iMacs.

Not saying they will, but at least i give it a chance. More of a chance than RAM slots or third-party GPUs.
 
Yeah, definitely their own design. I’m quite convinced they like their architecture, and that they have been working on ray tracing.
Since Apple is designing their GPU/CPU cores to be 'mobile first' I wonder how's raytracing going to fit into that perspective. Sure there's a lot you can do with 16-cores worth of raytracing resources (like, say, a 'Max' level GPU), but would a 4-core GPU (like in an iPhone configuration) be enough to do anything useful with raytracing? I'm under the (possibly wrong) impression that unless you have a 'critical mass' of raytracing performance available, you may as well have zero raytracing capabilities. Especially on mobile, where using a marginal raytracing capability to render some realistic reflections here and there is not going to make much of a difference.

A critical point (for what I use Metal for) would be having enough raytracing power to dump all the shadow mapping and ambient occlusion kernels and just use raytraced lighting instead. I understand it's easier to start small, but I wonder if going any less than all-in on raytracing would just take valuable die area from the GPU cores that could be more effectively dedicated to improving traditional shading capabilities. But I don't know how many additional transistors would the hardware-accelerated raytracing stuff require, maybe starting small doesn't take that much die area.

I guess Apple could also have different GPU cores for M-series and A-series, but that'd go against what they're currently doing.
 
Since Apple is designing their GPU/CPU cores to be 'mobile first' I wonder how's raytracing going to fit into that perspective. Sure there's a lot you can do with 16-cores worth of raytracing resources (like, say, a 'Max' level GPU), but would a 4-core GPU (like in an iPhone configuration) be enough to do anything useful with raytracing? I'm under the (possibly wrong) impression that unless you have a 'critical mass' of raytracing performance available, you may as well have zero raytracing capabilities. Especially on mobile, where using a marginal raytracing capability to render some realistic reflections here and there is not going to make much of a difference.

A critical point (for what I use Metal for) would be having enough raytracing power to dump all the shadow mapping and ambient occlusion kernels and just use raytraced lighting instead. I understand it's easier to start small, but I wonder if going any less than all-in on raytracing would just take valuable die area from the GPU cores that could be more effectively dedicated to improving traditional shading capabilities. But I don't know how many additional transistors would the hardware-accelerated raytracing stuff require, maybe starting small doesn't take that much die area.

I guess Apple could also have different GPU cores for M-series and A-series, but that'd go against what they're currently doing.

Yep. There are a lot of reasons that it makes sense to have some sort of split between “low end” and “high end.” Where you draw that split is a decision that needs to take into account both economics and physical practicalities. I could imagine a world where anything below a MBP doesn’t get ray tracing and anything above does. But I imagine Apple will put it in iPad. I also imagine Apple is working on making it work even in its VR/AR goggles, so they probably have found a way to get it done without requiring too much in the way of silicon resources.
But does an IPhone need ray tracing? Probably not any time soon. Would they love to include it an harp on how revolutionary it is? Yep. Can you do it in a die that meets the power and thermal requirements of an iPhone? I would wager you can.

To me, really, the wildcard is their VR goggle architecture. If they think you need it for that, and if they do the rendering on the device itself or on a coupled iPhone, then that will drive what they choose to do.
 
One more question @Cmaier, if I may. We've seen Intel follow Apple's lead with Alder Lake implementing big.LITTLE aka heterogeneous computing. The latest rumors claim that AMD is going the same route with Zen 5. What do you think the chances are of Apple taking a page from the x86 guys and implementing SMT? Does it make sense for their design, and if so, do you think we'd see it in both the performance and efficiency cores?
 
Back
Top