No “Extreme” chip coming to Mac Pro?

theorist9 · Dec 20, 2022

Colstan said:
As talented as Apple is, from its leadership to its engineering team, I still get the sense that something is off about the company's progress with Apple Silicon. I don't think it can be entirely blamed on global events, because their competitors continue to execute, more or less on schedule.

It could be that there's nothing wrong with their approach (not that you were suggesting there was), but rather that aspects of the transition turned out to be harder problems than they expected (and hard problems period), and will thus take some additional time to solve.

That's not to say they should be let off the hook for everything

, like the USB-A ports on the M1 being slower than those on their Intel predecessors, or the camera issues on the ASD.

Cmaier · Dec 20, 2022

Colstan said:
Some of this may be the "irrational exuberance" of the moment. I too was excited by the release of Apple Silicon, but I can't say I ever put it in such glowing terms. Gurman, for what it is worth, has stated that he thinks that the M2 is a stopgap, in his usual vague manner. It certainly feels like one, but maybe we were expecting a bit much after the sizable jump from Intel. As talented as Apple is, from its leadership to its engineering team, I still get the sense that something is off about the company's progress with Apple Silicon. I don't think it can be entirely blamed on global events, because their competitors continue to execute, more or less on schedule. Now, it looks like we'll be waiting until March for new Macs, when Apple historically makes such an announcement. It may be June at WWDC when the next Mac Pro is finally released, which will be year three of Tim Cook's two year timeline.

I blame global events. TSMC slipped, packaging supply chains slipped, everything slipped.

As for Mac Pro, the Apple Way™ is to go for the endgame and not the intermediate steps. I’m sure they mostly think the Right™ solution for Mac Pro is something like a Studio, where they offer chip modules with all the RAM in the package, but up to huge amounts of cores and RAM, and that sockets are a kludge. I think they are willing to concede that slots for M.2 and a special purpose PCI-style card or two are fine, but they feel like the CPU/GPU/RAM should be designed and built as a single entity to maximize performance and efficiency.

So a bona fide apple silicon mac pro is not a system they every really wanted to make, they project they won’t sell many of them anyway, and making it right requires a great deal of engineering effort and that a pretty novel packaging technology works out perfectly.

I still think they’ll do something, but I have no idea when. I think that making it worthwhile would mean the 3nm node, but, again, who knows.

Colstan · Dec 20, 2022

Cmaier said:
I still think they’ll do something, but I have no idea when. I think that making it worthwhile would mean the 3nm node, but, again, who knows.

Feel free to dodge this question, but other than your obvious credentials, I believe you still have some measure of contact with your previous colleagues that now work within Apple. When I queried on such issues in the past, you stated that you "got no sense that they are panicking", while also stating that you have to "read between the lines" when speaking with them, since they won't give you direct answers. (However, you did mention that you are certain they are working on ray tracing.) I assume that the previous post is your personal opinion, but is some of that informed by contact behind the scenes, or entirely an educated guess?

Cmaier · Dec 20, 2022

Colstan said:
Feel free to dodge this question, but other than your obvious credentials, I believe you still have some measure of contact with your previous colleagues that now work within Apple. When I queried on such issues in the past, you stated that you "got no sense that they are panicking", while also stating that you have to "read between the lines" when speaking with them, since they won't give you direct answers. (However, you did mention that you are certain they are working on ray tracing.) I assume that the above is you personal opinion, but is some of that informed by contact behind the scenes, or entirely an educated guess?

I’d say it’s mostly an educated guess. I mean, let’s say I knew that people on a CPU design team were designing a chip with certain not-very-specific, but still interesting, characteristics. That doesn’t mean that they have any idea what product will use that CPU. It doesn’t even mean any product will, if, say, apple changes its plans.

Add to it that nobody at apple would ever admit to anything and that all i can do is guess based on who is on what team and based on what flavor of silence or avoidance I get in response to questions, and it’s a bit like reading a crystal ball.

Colstan · Dec 20, 2022

Cmaier said:
I’d say it’s mostly an educated guess.

Well, then here is my entirely uneducated response. While there are certainly issues with Nvidia, AMD, and Intel, they have been more or less on schedule with Lovelace, Zen 4, RDNA3, and Alder Lake. The major exception is Arc, and Intel is doing this all new from scratch, so I won't fault them too much on that.

While Apple is technically on its second rodeo with Apple Silicon, they've been doing this for some time with the A-series. Beyond just the Mac Pro, it seems like there have been more delays with Apple than its competitors, not to mention the lackluster results from the M2 thus far, in my opinion. Assuming the leaked Geekbench results are correct, the high-end M2 Macs aren't going to be that impressive, in terms of raw performance, compared to the competition. As someone who only owns desktop computers, I find this disconcerting. It's great that their laptops get better battery life, but that does me no good, because I am a stationary sod who uses Macs that are plugged into the wall at all times.

Again, just my outlook at the moment, it's way too soon to make any judgement calls on this. I just see Apple delaying releases more than the competition in the general computing space. Myself and others harp on this because we want Apple to take on the PC crowd from top to bottom, and thus far, I'm not seeing it.

B01L · Dec 20, 2022

I still believe Apple thought they would be on 3nm by now...

I also believe Apple should leave the SoCs to the mobile/tablet/laptop/low-end desktop markets; and for the high-end desktop/workstation markets a chiplet-based (with substantially higher clock/power rates) approach might be better...?

Colstan · Dec 20, 2022

B01L said:
I still believe Apple thought they would be on 3nm by now...

Mayhap the reason for Gurman calling the M2 a "stopgap"? (Assuming we still believe this guy, but for this debate, I'll go with it.)

B01L said:
I also believe Apple should leave the SoCs to the mobile/tablet/laptop/low-end desktop markets; and for the high-end desktop/workstation markets a chiplet-based (with substantially higher clock/power rates) approach might be better...?

I wish I still had access to the statistics, this was a few years ago, but I believe around 85% of Macs shipped are laptops. Even if that number has changed, it can't be far off. I don't think it makes economic sense to split their chip offerings along those lines. A lot of the debate over the Mac Pro is how much effort Apple is going to put into a niche product, among their desktops, which are a minority category. Plus, as @Cmaier said, Apple have set a design philosophy and they're unlikely to deviate from that anytime soon.

B01L · Dec 20, 2022

Well, if Apple insists on sticking with the Mn Max laptop SoC as their largest building block, then they would need to find a way to add GPU cores...

Maybe they add a GPU-specific SoC to the lineup, so an Extreme build could have two "regular" Mn Max SoCs coupled with two GPU-specific SoCs...?

This GPU-specific SoC could also be combined (2, 4, 8, 16...?) on an add-in card for usage as a GPGPU, providing extra compute/rendering resources...?

quarkysg · Dec 20, 2022

B01L said:
Well, if Apple insists on sticking with the Mn Max laptop SoC as their largest building block, then they would need to find a way to add GPU cores...

Maybe they add a GPU-specific SoC to the lineup, so an Extreme build could have two "regular" Mn Max SoCs coupled with two GPU-specific SoCs...?

This GPU-specific SoC could also be combined (2, 4, 8, 16...?) on an add-in card for usage as a GPGPU, providing extra compute/rendering resources...?

Unlikely to be GPU cards. More likely with GPU specific dies connected via UltraFusion like derivatives where each of the GPU die comes equipped with their own memory controller.

So base SoC variants comes with some CPU and GPU cores, and it can be extended via additional GPU specific dies.

Colstan · Dec 20, 2022

quarkysg said:
Unlikely to be GPU cards. More likely with GPU specific dies connected via UltraFusion like derivatives where each of the GPU die comes equipped with their own memory controller.

So base SoC variants comes with some CPU and GPU cores, and it can be extended via additional GPU specific dies.

I think something like this is more reasonable than going all in on chiplets. It also tracks better than using third-party graphics cards, which appear to be dead, despite those pining for them. For the Mac Pro alone, then I don't see this happening, but it would make sense if Apple could use such a thing inside the MacBook Pro, resurrected iMac Pro, and Mac Studio, along with the next Mac Pro.

Cmaier · Dec 20, 2022

quarkysg said:
Unlikely to be GPU cards. More likely with GPU specific dies connected via UltraFusion like derivatives where each of the GPU die comes equipped with their own memory controller.

So base SoC variants comes with some CPU and GPU cores, and it can be extended via additional GPU specific dies.

All of which is to say that Apple is faced with the fundamental problem that if they really want to do a true Mac Pro, they have to do custom silicon just for that. Namely:

- some way of adding GPU cores (though I suppose they could just add yet more SoC’s and disable the CPU cores on them)
- some way of treating package memory as a cache for DIMM’d memory
- maybe other I/O solutions

If they *did* do something like a GPU-only die (or, more likely, a GPU & neural engine die), it’s possible they could pair it with high-end chips other than the super-dual-ultra, for iMacs or Mac minis or macbook pros, I suppose.

B01L · Dec 20, 2022

Adding more SoCs means more UltraFusion connections...
Mn Ultra is UltraFusion between two Mn Max dies...
Mn Extreme would be UltraFusion between four Mn Max dies...
Anything larger (more than four dies) would exceed the size (currently) possible for the UltraFusion connections themselves...?

If trying to increase GPU core count by increasing die count (which we cannot do beyond four dies because of the aforementioned maximum size for the whole UltraFusion dealio), there would be more than just CPU cores to disable (Neural Engines/Media Engines/TB/etc.), so one would think an Ultra that was a regular Mn Max die & a GPU-specific die would be the way to go...?

Then strap two of those together for a GPU-heavy Mn Extreme offering...?

And boost the clocks by a goodly amount...?

quarkysg · Dec 20, 2022

Colstan said:
For the Mac Pro alone, then I don't see this happening, but it would make sense if Apple could use such a thing inside the MacBook Pro, resurrected iMac Pro, and Mac Studio, along with the next Mac Pro.

I do think Apple may go with GPU only dies to increase AS Mac Pro's GPU power tho. Them fabricating massive SoC just for the Mac Pros will not be cost effective for them. I would think using this approach will be scalable from the Mac Mini all the way to the Mac Pro, with the iMac 27/32 Pro inclusive. Using this approach tho. will likely result in memory access between dies to be non-uniform, and result in die cache sync latency issue tho, which may force macOS to handle NUMA.

Cmaier said:
- some way of treating package memory as a cache for DIMM’d memory

It will be interesting to see how Apple handles the AS Mac Pro's memory issue. Using package memory as cache will introduce NUMA issues for the entire SoC I would think. macOS and all drivers will have to be re-worked to handle this.

Cmaier · Dec 20, 2022

B01L said:
Adding more SoCs means more UltraFusion connections...
Mn Ultra is UltraFusion between two Mn Max dies...
Mn Extreme would be UltraFusion between four Mn Max dies...
Anything larger (more than four dies) would exceed the size (currently) possible for the UltraFusion connections themselves...?

If trying to increase GPU core count by increasing die count (which we cannot do beyond four dies because of the aforementioned maximum size for the whole UltraFusion dealio), there would be more than just CPU cores to disable (Neural Engines/Media Engines/TB/etc.), so one would think an Ultra that was a regular Mn Max die & a GPU-specific die would be the way to go...?

Then strap two of those together for a GPU-heavy Mn Extreme offering...?

And boost the clocks by a goodly amount...?

You can get more ultra fusion connections if you build a crossbar, which is essentially a chiplet that has ultra fusion on all four sides, and buffers and routes from one side to another. You would likely have to know ahead of time that you were going to do this, though, and build support for it into all the chiplets, because the stuff crossing through ultrafusion, i believe, is point-to-point and not a bus-style network (at least some of it).

Cmaier · Dec 20, 2022

quarkysg said:
It will be interesting to see how Apple handles the AS Mac Pro's memory issue. Using package memory as cache will introduce NUMA issues for the entire SoC I would think. macOS and all drivers will have to be re-worked to handle this.

You would presumably design it to be transparent to the software. You assume that everything is in package memory, but, if not, you pay a latency cost. Presumably, too, memory shared with at least GPUs would always be kept in package memory, since it’s pretty small compared to the total address space. I guess. So the latency issue would probably only come up in multi-processing scenarios where your working set is huge (bigger than package memory) and the access pattern defeats the cache replacement algorithm (say lots of random accesses to scattered addresses)

quarkysg · Dec 20, 2022

Cmaier said:
You would presumably design it to be transparent to the software. You assume that everything is in package memory, but, if not, you pay a latency cost. Presumably, too, memory shared with at least GPUs would always be kept in package memory, since it’s pretty small compared to the total address space. I guess. So the latency issue would probably only come up in multi-processing scenarios where your working set is huge (bigger than package memory) and the access pattern defeats the cache replacement algorithm (say lots of random accesses to scattered addresses)

I'm struggling to see how this would work from a software perspective tho. Say when a driver/kernel code loads from memory, it will take different clock cycles for that data to arrive to the CPU core/cache. Does the load instruction stall until memory arrives? When a data is written back into memory, does the driver/kernel need to flush it to other cores/cache.

Hmm ... probably need to take more OS lessons.

Cmaier · Dec 20, 2022

quarkysg said:
I'm struggling to see how this would work from a software perspective tho. Say when a driver/kernel code loads from memory, it will take different clock cycles for that data to arrive to the CPU core/cache. Does the load instruction stall until memory arrives? When a data is written back into memory, does the driver/kernel need to flush it to other cores/cache.

Hmm ... probably need to take more OS lessons.

When you do a load from memory you NEVER know ahead of time how long it will take. It could come from the L1 cache. Or the L2 cache. Or the system cache. Or it could be paged in from a disk (virtual memory).

So the load will cause a pipeline stall once there are no other instructions that don’t have dependencies on the load target that can be issued while you are waiting. That happens even if the load can use the L1 cache, since the L1 cache takes more than a cycle to read.,

In other words, if you are writing hardware-level code like drivers, you need to treat memory reads as asynchronous. They finish when they finish.

leman · Dec 20, 2022

theorist9 said:
I don't know how this works—obviously!—but I was imagining they might pre-render certain costly (costly to render) objects using powerful machines, and insert them as modules into the AR/VR code so that they don't have to be rendered from scratch by the headset.

I.e., I'm thinking of a 3D sprite or a 3D cutscene, by analogy to what they use for 2D games:

Why prerendering is here to stay in game dev

In this article, we'll be digging into how and why prerendered game visuals are used and why they aren't likely to leave games anytime in the near future.

garagefarm.net

And if you did need powerful machines to prerender 3D sprites or cutscenes for an Apple AR/VR game, would that need to be done in MacOS, or could it be done on any machine and then dropped into the AR/VR code?

Pre-rendering usually makes sense if there is limited to no visual interactivity with an object (backgrounds, remote landscapes that are far enough to ignore camera angles) or if the object can effectively blend with the surrounding (e.g. tree billboards used in older titles). I have no intuition how much for this will apply to AR/VR. Anyway, there are probably more straightforward and easy to justify applications of production renderers

theorist9 said:
Separately, could another possible market for the Mac Pro be those doing development work for ARM-based supercomputers, particularly since it will probably be the only commercially-available ARM-based workstation for some time?

Or would they need to do the dev work on the same microarchitecture used in the supercomputer? If the latter, I'd imagine a typical ARM supercomputer package would come with several custom development workstations made using the same processors as those in the supercomputer itself.

Something like a MacBook Pro will already be sufficient for prototyping etc. You don't need a huge workstation to write supercomputer code.

leman · Dec 20, 2022

Regarding the main topic, it's quite interesting to look at Apple's recent patents as a potential direction on where they are going.

For example, in terms of the general system architecture and memory hierarchy there are these recently filed patents:

- https://patents.google.com/patent/US20220334997A1/en (seems to describe a more flexible arrangement of various processors in a fast on-chip network)
- https://patents.google.com/patent/US20220342805A1/en (seems to describe a power-efficient way of working with many distributed memory controllers)
- https://patents.google.com/patent/WO2022056207A1/en (seems to describe energy-efficient forwarding of interrupts in large cluster architectures)

Regarding architecture of larger system, I was not able to find anything newer(*) than this 2019 patent, which is also fairly generic:

- https://patents.google.com/patent/US10742217B2/en

*There is also this (https://patents.google.com/patent/US20220284163A1/en) which seems to describe UltraFusion.

For a possible modular Mac Pro that has both fast on-package RAM and (maybe expandable, although there is not direct mention of expandability) high-density RAM, there is this:

- https://patents.google.com/patent/US10916290B2/en

On the graphics side, there is a bunch of things related to ray tracing, as well as these newer patents that seem to focus on more efficient utilisation of resources both within the GPU and across multiple GPU clusters:

- https://patents.google.com/patent/US11204774B1/en
- https://patents.google.com/patent/US20220237028A1/en
- https://patents.google.com/patent/US11422822B2/en

These are just things I could find while sipping my morning tea, so I might have missed something important. There are also a bunch of patents on ML accelerators and matrix processors. Overall, it seems that Apple is working on the GPU and memory architectures and looking to make their SoC's more energy efficient and at the same time more capable. I didn't see too much stuff directly related to a potential modular Mac Pro however...

Andropov · Dec 21, 2022

Colstan said:
He doesn't say anything about external DIMMs being available inside the new Mac Pro, just talking about "hallmark features"

The handles!

Cmaier said:
The problem with DIMMs is that you then have some memory accesses which take a thousand times longer than others, depending on whether the address is in the DIMM or in the package. So if you are going to do this, the engineering solution is to create a complicated memory controller that essentially turns the package memory into a cache. Either a real cache (where each entry in package memory has a corresponding entry in the DIMM memory, so your total memory is really the size of the DIMM memory) or, with much more difficulty, you shuffle things back and forth based on use, so your total memory is DIMM memory plus package memory. No way they do the latter.

The former is not conceptually difficult, but does require engineering a fairly complex bunch of logic with some interesting timing implications. I am pretty sure they’re not doing that either, because that would be something new on the chip, and they really just want to tile existing chips together and not make special silicon for the Pro.

Who knows.

A 4-die M2 Ultra would already have up to 256GB of on-package RAM. Not only would they be creating a sort-of cache for the package memory for Mac Pro users only (which is already a small customer base), it would only benefit Mac Pro users who need more than 256GB of RAM installed. Which I imagine an even smaller customer base.

Btw, couldn't this be done in software instead? Use in-package memory as the "traditional" RAM, then use DIMM slots as a first level super-fast storage for virtual memory (instead of going down to SSD storage when in-package RAM is not enough). An additional software layer would be needed to address page faults in DIMM slots (when both the in-package and DIMM RAMs are full) and fall back to disk then, but I guess that's cheaper to implement than designing hardware to turn the in-package RAM as a cache.
And you'd be sort-of combining the capacities of both RAMs too.

leman said:
Pre-rendering usually makes sense if there is limited to no visual interactivity with an object (backgrounds, remote landscapes that are far enough to ignore camera angles) or if the object can effectively blend with the surrounding (e.g. tree billboards used in older titles). I have no intuition how much for this will apply to AR/VR. Anyway, there are probably more straightforward and easy to justify applications of production renderers

Baked shadows are another good example of pre-computed rendering properties.

No “Extreme” chip coming to Mac Pro?

Site Champ

Site Master

Site Champ

Site Master

Site Champ

SlackMaster

Site Champ

SlackMaster

Power User

Site Champ

Site Master

SlackMaster

Power User

Site Master

Site Master

Power User

Site Master

Elite Member

Elite Member

Site Champ

Similar threads