No “Extreme” chip coming to Mac Pro?

exoticspice1

Site Champ
Posts
305
Reaction score
107
Moreover, most of the remaining types of cards that WOULD work would work just as well as thunderbolt devices or in a thunderbolt PCI enclosure. The number of cards that require the full speed of a slot solution and which aren’t GPUs is pretty small
It would work but like that pic showed Pros like towers with PCIe slots as it's less messy.
 

exoticspice1

Site Champ
Posts
305
Reaction score
107
The ultra extreme will exist, or they’ll run it at a silly high clock rate, or it will be socketed so that you can upgrade by replacing the SOC
They could have done this with the Studio as well. Especially made the SSDs user accessable.
 

B01L

SlackMaster
Posts
179
Reaction score
137
Location
Diagonally parked in a parallel universe...
I said “mostly” useless. Certainly the vast majority of cards plugged into Mac Pros are graphics cards. And if those won’t work, then, for most people, those slots aren’t very useful.

Moreover, most of the remaining types of cards that WOULD work would work just as well as thunderbolt devices or in a thunderbolt PCI enclosure. The number of cards that require the full speed of a slot solution and which aren’t GPUs is pretty small.

A good number of (non-GPU) PCIe cards want more bandwidth than provided by Thunderbolt; M.2-based RAID cards & 8K video I/O cards come to mind...

Still holding hope Apple gives us ASi GPGPUs for scheduling assorted compute/render jobs; let the iGPU in the SoC handle display outputs...

I would assume sending a render job to a GPGPU would be fine with onboard RAM, no need to be tied into the UMA RAM; just send the job to an ASi GPGPU with a silly amount of RAM, job loads into RAM on the card easy peasy, completed job dumps to storage...?
 

quarkysg

Power User
Posts
75
Reaction score
52
Still holding hope Apple gives us ASi GPGPUs for scheduling assorted compute/render jobs; let the iGPU in the SoC handle display outputs...

I would assume sending a render job to a GPGPU would be fine with onboard RAM, no need to be tied into the UMA RAM; just send the job to an ASi GPGPU with a silly amount of RAM, job loads into RAM on the card easy peasy, completed job dumps to storage...?
The benefit of UMA is allowing massive amount of memory for the GPU to chew through. Going to the traditional GPU on a PCIe card will thus negate this benefit and break the AS macOS programming model. I don't think this will happen.

I'm still in the camp that Apple will provide a huge amount of memory, maybe via DDR5 slots and take the hit with latency and bandwidth. Probably increase the data bus width from the existing 1024-bits to something more to compensate.

GPU cores that can chew through hundreds of GB of data without getting interrupted does have its benefits.
 

theorist9

Site Champ
Posts
633
Reaction score
594
iThe benefit of UMA is allowing massive amount of memory for the GPU to chew through. Going to the traditional GPU on a PCIe card will thus negate this benefit and break the AS macOS programming model. I don't think this will happen.

I'm still in the camp that Apple will provide a huge amount of memory, maybe via DDR5 slots and take the hit with latency and bandwidth. Probably increase the data bus width from the existing 1024-bits to something more to compensate.

GPU cores that can chew through hundreds of GB of data without getting interrupted does have its benefits.
Just as a point of interest, I believe UMA's primary performance benefit is that it obviates the need to copy data back and forth between two separate pools of memory (that for the CPU and GPU); that it also gives the GPU access to an usually large amount of memory is an additional, but secondary, benefit (except in unusual cases that require more memory than GPU's normally offer).
 
Last edited:

theorist9

Site Champ
Posts
633
Reaction score
594
I'm going to go out on a limb and predict that, if the Mac Pro has soldered RAM then, to further differentiate it from the the Studio, they'll offer it with LPDDR5X (LPDDR5T probably won't be available in time). With the Mac Pro, they certainly don't have to worry about it being available in sufficient volume.
 

quarkysg

Power User
Posts
75
Reaction score
52
Just as a point of interest, I believe UMA's primary performance benefit is that it obviates the need to copy data back and forth between two separate pools of memory (that for the CPU and GPU); that it also gives the GPU access to an usually large amount of memory is an additional, but secondary, benefit (except in unusual cases that require more memory than GPU's normally offer).
Well, if the bandwidth between system memory and GPU memory is fast enough, UMA GPU vs the traditional GPU model would not be too much of an issue I would think. The problem is that dGPUs are currently bottlenecked by the PCIe bus, which is an order or more magnitude less than system memory for AS Macs.

The benefit of GPUs having access to large swath of memory would benefit rendering scenes with huge assets where it could not fit into one dPGU memory. I think that is the reason why production renders mainly uses CPUs instead of GPUs. Another benefit would be training AI models where the dataset is larger than what the dGPU RAM could fit.
 

Yoused

up
Posts
5,684
Reaction score
9,074
Location
knee deep in the road apples of the 4 horsemen
Perhaps one advantage of an isolated dGPU is freedom of sloppiness. The CPU sends the code (pretty compact), description files (still pretty small) and textures (generally less small) and the GPU chomps on it to produce the result. If the output is to the screen, the GPU just handles it and the processor goes on doing what it was doing, unaffected (assuming the card includes the display driver).

Ultimately, the data transfer is usually a lot smaller than the work frame, and if the GPU has enough room, it can generate intermediate content with abandon (such as hidden surfaces that do not sow up in the final output) – sloppiness. The final result (image) will still be smaller to pass back, if necessary, than the GPU's overall workset. I believe the same is somewhat true for heavy math jobs, though somewhat less so.

So a UMA iGPU has to be a little more elegant in how it runs, making more judicious use of limited memory space and bus bandwidth. The code has to be more efficient, which means that an iGPU will almost always draw less power for the same job, because it will do it differently. It would seem that the dGPU is the sledgehammer approach.
 

dada_dave

Elite Member
Posts
2,255
Reaction score
2,269
Perhaps one advantage of an isolated dGPU is freedom of sloppiness. The CPU sends the code (pretty compact), description files (still pretty small) and textures (generally less small) and the GPU chomps on it to produce the result. If the output is to the screen, the GPU just handles it and the processor goes on doing what it was doing, unaffected (assuming the card includes the display driver).

Ultimately, the data transfer is usually a lot smaller than the work frame, and if the GPU has enough room, it can generate intermediate content with abandon (such as hidden surfaces that do not sow up in the final output) – sloppiness. The final result (image) will still be smaller to pass back, if necessary, than the GPU's overall workset. I believe the same is somewhat true for heavy math jobs, though somewhat less so.

So a UMA iGPU has to be a little more elegant in how it runs, making more judicious use of limited memory space and bus bandwidth. The code has to be more efficient, which means that an iGPU will almost always draw less power for the same job, because it will do it differently. It would seem that the dGPU is the sledgehammer approach.
That depends on how the iGPU works and the nature of the project. Older iGPUs certainly suffered as you described. But theoretically Apple’s iGPU should be the best of both worlds. It has the bandwidth of the dGPU and a vastly larger memory pool possible than a typical dGPU (especially when comparing dGPUs of equivalent performance). So theoretically you can get away with even more sloppiness on the iGPU. The only thing the dGPU might have an advantage in this regard is allowing excess power draw with its own bespoke cooling solution whereas the iGPU by sitting on the same die/package might be afforded less due to the needs of heat dissipation right next to the CPU which can also be working hard simultaneously. However Apple hasn’t yet released a GPU big enough to know how much that hurts (unless that the M1 Ultra’s problem).
 

B01L

SlackMaster
Posts
179
Reaction score
137
Location
Diagonally parked in a parallel universe...
The benefit of UMA is allowing massive amount of memory for the GPU to chew through. Going to the traditional GPU on a PCIe card will thus negate this benefit and break the AS macOS programming model. I don't think this will happen.

GPU cores that can chew through hundreds of GB of data without getting interrupted does have its benefits.

Think of the ASi GPGPU(s) as a personal compute/render farm, with their own ridiculous amount of ultra high-speed RAM...

I'm going to go out on a limb and predict that, if the Mac Pro has soldered RAM then, to further differentiate it from the the Studio, they'll offer it with LPDDR5X (LPDDR5T probably won't be available in time). With the Mac Pro, they certainly don't have to worry about it being available in sufficient volume.

I have been saying this (LPDDR5X SDRAM in the ASi Mac Pro) for quite awhile now...
 

dada_dave

Elite Member
Posts
2,255
Reaction score
2,269
Think of the ASi GPGPU(s) as a personal compute/render farm, with their own ridiculous amount of ultra high-speed RAM...

At that point the might actually sell entire daughter boards with CPU and GPU. I believe that was suggested on here and MR by a couple of posters, maybe even you?
 

dada_dave

Elite Member
Posts
2,255
Reaction score
2,269
No they are not bottlenecked. Actually PCIe 4 barely saturates the RTX 4090 and PCIe 5 has already been released.
That depends on the workload. For games? No they’re not bottlenecked. For high end rendering + compute, they can be if the working set is big enough.
 

B01L

SlackMaster
Posts
179
Reaction score
137
Location
Diagonally parked in a parallel universe...
At that point the might actually sell entire daughter boards with CPU and GPU. I believe that was suggested on here and MR by a couple of posters, maybe even you?

Way way way back, before I even came to this forum, I speculated about a Mac Pro Cube with a backplane for daughtercards; main system on one card, more slots for optional cards...

Neural Engine cards, GPU cards, M.2 NVMe RAID cards, audio I/O cards, video I/O & DSP cards, etc. ...

But once Johny Srouji told me why Apple was not going with a multi-socket approach, I realized that multiple system daughtercards would be a no-go...

I have theorized about a multi-slot chassis that had the actual system on a daughtercard for ease of upgrading; and another multi-slot chassis that could host as many system daughtercards as it had slots, but that would be in a compute/render/server setup, not an integrated system where all the cards were seen as one, too much NUMA headache...?
 
Last edited:

dada_dave

Elite Member
Posts
2,255
Reaction score
2,269
Way way way back, before I even came to this forum, I speculated about a Mac Pro Cube with a backplane for daughtercards; main system on one card, more slots for optional cards...

Neural Engine cards, GPU cards, M.2 NVMe RAID cards, audio I/O cards, video I/O & DPS cards, etc. ...

But once Johny Srouji told me why Apple was not going with a multi-socket approach, I realized that multiple system daughtercards would be a no-go...

I have theorized about a multi-slot chassis that had the actual system on a daughtercard for ease of upgrading; and another multi-slot chassis that could host as many system daughtercards as it had slots, but that would be in a compute/render/server setup, not an integrated system where all the cards were seen as one, too much NUMA headache...?
Yeah I was thinking not as integrated setup with the main board but more like a mini node cluster with everything on the daughter card SOC. I don’t think Apple would actually do that, it would have to setup a task system, etc… but in theory you could even run such a thing over thunderbolt and sell that as well. It’s a fun idea.
 

Yoused

up
Posts
5,684
Reaction score
9,074
Location
knee deep in the road apples of the 4 horsemen
theoretically Apple’s iGPU should be the best of both worlds. It has the bandwidth of the dGPU and a vastly larger memory pool possible than a typical dGPU (especially when comparing dGPUs of equivalent performance). So theoretically you can get away with even more sloppiness on the iGPU. The only thing the dGPU might have an advantage in this regard is allowing excess power draw with its own bespoke cooling solution

The advantage I was thinking of for the dGPU is that is completely owns its own bus, which has basically no bandwidth cost to the CPU while it is chewing on its data. The iGPU takes bus bandwidth from the rest of the system, or has to wait its turn on the bus. Granted, if the work you are doing is highly GPU-focussed for the render/math interval (the CPU is mostly just puttering around, waiting, maybe checking mail) the difference is probably trivial.

But mainly I was thinking the iGPU must be less sloppy because it has to share it bandwidth. And quite frankly, I find the more carefully-crafted elegant approach more appealing, in large part because it uses less juice.
 

exoticspice1

Site Champ
Posts
305
Reaction score
107
That depends on the workload. For games? No they’re not bottlenecked. For high end rendering + compute, they can be if the working set is big enough.
Thats the VRAM, for games yes 10GB - 16GB is enough but your right for compute and rendering 24GB is not enough(The highest in the 4090).

However, I am talking about the PCIe bus standard. Current GPUs will never be bottlenecked by the PCIe 4 bus.
 

theorist9

Site Champ
Posts
633
Reaction score
594
Each optional wheel on the AS Mac Pro will incorporate an A13 Bionic processor, which will enable "innovative wheel features". As expected, these will cost more than the "dumb wheels" on the current model.
 
Last edited:

dada_dave

Elite Member
Posts
2,255
Reaction score
2,269
Thats the VRAM, for games yes 10GB - 16GB is enough but your right for compute and rendering 24GB is not enough(The highest in the 4090).

However, I am talking about the PCIe bus standard. Current GPUs will never be bottlenecked by the PCIe 4 bus.
That’s the same thing ;) - for compute/rendering if the PCIe bus wasn’t a bottleneck then it wouldn’t matter that the problem couldn’t fit into VRAM, you could hide the latency as it were fetching the data from RAM. And for some even large data sets that is possible to do depending on the workload! The problem is accessing data you need now and can’t hide over PCIe is painfully slow due to latency. Bandwidth over PCIe could be an issue as well if the data set gets big enough - like hundreds of GB data needing to be swapped back and forth between GPU and CPU.
 
Top Bottom
1 2