Apple M2 Ultra and Extreme Design

But this was all relatively minor compared with everything else presented at WWDC22 related to Metal (Metal FX Upscaling, new workflows to pre-compile shaders, geometry shaders, a C++ API for Metal...), most of it unrelated to ray tracing. Nothing groundbreaking on the raytracing field that would suggest they're leveling the field for a grand reveal. Maybe on WWDC23?

My impression is that the raytracing API in Metal is already rather mature. It is fairly similar to what DX12 Ultimate or Vulkan extensions offer, and I think even more capable in some key areas (I don't think that DX12 offers motion blur APIs or in-node storage, plus Metal RT has larger data structure limits more suitable fro production ray tracing). So lacks of big upgrades might simply mean that the API is "done".

Anyhow, while the exact details of the Apple Silicon architecture weren't discussed in WWDC20, they unveiled it was coming. I guess they could do the same here: announce raytracing is coming (and how the new APIs work, or how to improve the existing code to tailor it to the new hardware), and release the first commercial products that use it later. But it would be a bit awkward without something like the Developer Transition Kit they announced in WWD20 for the Apple Silicon transition.

I would be surprised if their RT API was not developed with the upcoming hardware in mind. Most likely, the code will just work on new GPUs with hardware acceleration, with no changes or tweaks required.
 
I would be surprised if their RT API was not developed with the upcoming hardware in mind. Most likely, the code will just work on new GPUs with hardware acceleration, with no changes or tweaks required.

Sounds a little like the rare case of the software guys being ahead of the hardware guys?
 
Sounds a little like the rare case of the software guys being ahead of the hardware guys?
Rare? It is typical. RT has been happening in SW for ages. It is not at all surprising that Metal would have a RT API. Just like the original Mac did floating point in software until the hardware designers decided it was important enough to implement in circuitry. Or image compression, which is currently handled by specialized SoC units. I suspect the M-series processor cores have dedicated hardware to speed up object method dispatch, which has always been done in software and is extremely heavily used (the performance benefit would be too ridiculous to ignore).

Just look back through computer development and you will see endless examples of operations getting folded into hardware to gain better performance. It is the norm, not a rarity.
 
Just look back through computer development and you will see endless examples of operations getting folded into hardware to gain better performance. It is the norm, not a rarity.
That's not what I mean in this case, and I am not that dense in my own field, I would hope. Apple's co-development of hardware and the software meant to leverage it generally has a smaller gap. I'm more poking fun at this as if Apple had launched the pen input APIs but sat on the iPad Pro for 3 years.

Auto layout had 2 years of runway before the iPhone 6 leveraged it for the new screen sizes, but that's about the longest runway Apple's given folks in recent memory.

That all said, it does look like Metal RT for cases like Blender does use GPU resources to accelerate things, but nothing as nice as the dedicated hardware for doing ray intersections that Nvidia, and now AMD, have.
 
Sounds a little like the rare case of the software guys being ahead of the hardware guys?

They probably wanted to get all the infrastructure ready. The nice thing about having a mature API is that future hardware should be able to accelerate todays software.
 
Last edited:
I would be surprised if their RT API was not developed with the upcoming hardware in mind. Most likely, the code will just work on new GPUs with hardware acceleration, with no changes or tweaks required.
I'm sure they had the upcoming hardware in mind, and I agree that it's most likely that the RT code will just work without changes. However, I do expect talks outlining things like best practices on how to setup the RT code to maximize performance. Maybe even minor additions to the API to make the hardware's job easier/faster (could the in-node storage have been that?). Even with the knowledge of how the RT hardware was going to be, it's been 3 years, it'd be impressive if the software team finished an API that long in advance and it didn't require even minor changes. Not because the API was badly designed or anything, just because profiling usually shows critical (an often unexpected) bottlenecks, and even with pre-silicon access, thing might be different on the newest hardware.
 
Resurrecting this thread. On a podcast called “Upgrade” (Hosts: Myke Hurley, Jason Snell) there has been recent discussion of the Mac Pro and the Extreme Chip. Many are assuming that an extreme chip is coming with perhaps the M3 or later. According to the hosts, who claim to have a good source (who knows? could be bs) all plans for an “Extreme” have been cancelled. No idea if they are legit in terms of sources. Take with a large pinch of salt.

 
Resurrecting this thread. On a podcast called “Upgrade” (Hosts: Myke Hurley, Jason Snell) there has been recent discussion of the Mac Pro and the Extreme Chip. Many are assuming that an extreme chip is coming with perhaps the M3 or later. According to the hosts, who claim to have a good source (who knows? could be bs) all plans for an “Extreme” have been cancelled. No idea if they are legit in terms of sources. Take with a large pinch of salt.


I have reason to believe this is a true rumor.
 
According to the hosts, who claim to have a good source (who knows? could be bs) all plans for an “Extreme” have been cancelled.
For those who don't wish to listen to the podcast, the rumor is:

1. From somebody who works on Apple's GPU team.
2. The quad chip has been canned, which our resident CPU engineer corroborates.
3. Right now, they are working on what will be the M5 chip.
4. Quad chip was only ever specced for the M1 series and removed late in the project.
5. No plans to resurrect the quad design through the M7 generation.
6. Quad was too much effort for too small a market.
7. Multi-die packaging may come with the M8 or later generations, which allows for CPU and GPU to be fabbed on separate dies. However, no such plans currently exist.

From my perspective, it appears that Apple Silicon is going to be become more predictable for the foreseeable future, which was eventually going to happen.
 
That does raise an interesting question, why didn’t the M1 Ultra come to the Mac Pro? If they never had any intention of making an Extreme?
 
That does raise an interesting question, why didn’t the M1 Ultra come to the Mac Pro? If they never had any intention of making an Extreme?
Perhaps they were originally intending to do the Pro with M1 Extreme, but by the time they bailed out on the Extreme it was too late to back off and do an Ultra box.
 
Perhaps they were originally intending to do the Pro with M1 Extreme, but by the time they bailed out on the Extreme it was too late to back off and do an Ultra box.
How do you think it was supposed to have worked with the M1 Max? After all, they only had one connection … a crossbar? I wonder if we’ll ever see a leaked design … I also wonder when the IRQ was controller changed. Finally, it’s odd that Hector had dug up a dev string for a chip with an IRQ controller handling up to 8 dies, but I guess that’s just a leftover. :(
 
How do you think it was supposed to have worked with the M1 Max? After all, they only had one connection … A crossbar? I wonder when was the IRQ controller changed. Also odd that Hector had dug up a dev string for a chip with an IRQ controller handling up to 8 dies but I guess that’s just a leftover.

I think they made the decision before they taped out the M1 Max. Honestly, I don’t see where the other connection could even have gone (without refloorplanning), so maybe that was the impetus - they would have had to do a completely new die perhaps.
 
2. The quad chip has been canned, which our resident CPU engineer corroborates.
4. Quad chip was only ever specced for the M1 series and removed late in the project.
6. Quad was too much effort for too small a market.
To be fair, if true (and seems likely as two more seemingly independent lines of evidence now agree), score one for Gurman. He may have gotten other things wrong, but that’s a big one he got right and I doubted.
 
I think they made the decision before they taped out the M1 Max. Honestly, I don’t see where the other connection could even have gone (without refloorplanning), so maybe that was the impetus - they would have had to do a completely new die perhaps.
Makes sense, so it couldn’t have been tooo late in development.
 
Unfortunately I think that means though my fear in the WWDC thread was correct: they priced the Mac Pro to die. It might get an update or so but it won’t continue for much longer.

Edit: the only way I think this is wrong is if this is Apple putting out disinformation to catch leakers. But that feels unlikely with 3 different lines of evidence. So RIP to the good ‘ol cheese grater!
 
Last edited:
So when I was skeptical about an Extreme Apple Silicon it was correct to be skeptical?

This also reinforces the idea I said in the WWDC thread that the Mac Pro is positioned now specifically to address users who need specialized PCIe expansion. If they don't then the Studio will suit them well.
 
So when I was skeptical about an Extreme Apple Silicon it was correct to be skeptical?

This also reinforces the idea I said in the WWDC thread that the Mac Pro is positioned now specifically to address users who need specialized PCIe expansion. If they don't then the Studio will suit them well.
No, it’s not specialized for that at all. The Mac Pro has no more PCIe expansion that you can get in a Studio. The only thing it can do that a Studio can’t is host a single device taking up a full x16 lanes of bandwidth at once. That’s it. And that’s not enough to support a full product line.

Other than that, the Studio can do everything the Pro can for cheaper - including general PCIe connectivity. They have the same number of lanes - the Studio can even be put in a rack mounted enclosure with PCIe slots. You won’t get the full bandwidth on a single device but then if you do that on that Pro you’ll degrade performance of the connection the moment you add anything else so the advantage of a full bandwidth slot is almost gone.

Basically, the Mac Pro isn’t specialized for anything, Apple is saying the Mac Pro is a dead end device.
 
Last edited:
Think about it this way: the Mac Pro is the only* Mac in the lineup with a single SOC choice that shares that SOC with its internal competition. *Actually that’s not true, the only other machine like that is the 13” pro which is also very clearly a dead end in terms of devices.

MB Air - Mx
*13” Pro - Mx
MB Pro - Mx Pro/Max

Mini - Mx/Mx Pro
Studio - Mx Max/Ultra
*Mac Pro - Mx Ultra

iMac - Mx, no other Macs with a built in screen

*These two are dead end end devices. And at least the 13” pro has the Touch Bar to differentiate it … yikes that’s what we’re down to, the Mac Pro has … ? Internal physical PCIe slots that don’t actually carry anymore bandwidth so a Mac Studio in a chassis can match its functionality almost exactly for cheaper?
 
Last edited:
It's kind of interesting since a depiction of quad configuration is featured in a recent Apple patent:


Of course, it doesn't need to mean anything, and it could be just an artefact from the early Extreme era, but I do find it notable that all the previous patents only ever depicted two SoCs (with a direct connection) while this one depicts four SoCs with a routing network between them.

It does make sense that Extreme is too much effort though. I wonder whether they have any other plans for scaling to the needs of high-end desktop or whether they are just going to drop it.

I also wonder where this leaves this type of multi-die product, especially since @Colstan writes that no multi-die packaging is planned any time soon:


I thought it would be a great way to deal with the increased cost of 3nm process while increasing both the compute density and caches in a SoC. I am a bit worried about Apple's ability to continue pushing the performance boundaries if they stay with the monolithic die.
 
Back
Top