What does Apple need to do to catch Nvidia?

This just came into my feed today, a day after posting yesterday.
I’m going to double down on my comment that Apple doesn’t need to focus on beating nvidia in terms of gpgpu compute ampere beating llm processing. I think Apple will go custom acceleration on chip for their own dedicated inference engine for a major update to Siri if Siri starts running LLM inferences.
Fascinating reading appreciating that it’s targeting cloud builders but still relevant to the discussion here.

This looks to me very similar to in-memory processing, something that pretty much everyone has been aggressively researching for a while. I personally know a guy who's been recently hired by AMD to develop this kind of device, and Nvidia has been talking about this stuff for years. And if I am not mistaken, Apple even has a patent for something similar.

Of course, it will take a while until this kind of technology can come to a personal device, as it's still going to be large and expensive. This is why Apple is researching ways to run LLMs more efficiently. I am sure we will see many breakthrough in the next few years. Transformers have been a big thing, but they are also approaching the problem in the dumbest possible way — multiplying everything with everything and effectively discarding the vast majority of resulting values. Latest research suggests that similar results are possible with much more efficient algorithms that utilise caching and avoid computing discarded values in the first place.
 
I have to agree that while UMA has a lot of other advantages unless base ram increases dramatically game developers would struggle to take advantage of the really high effective VRAM available in some models - at least for users. But I’m not an expert in game development so maybe there’s something I’m missing. I mean the ability to store lots of huge assets all in memory is an obvious advantage but if you bank your performance based on your ability to do that then you’re right lower models will suffer. We’ve seen that play out in the PC and console space lots of times. So 🤷‍♂️. Maybe posters like @leman, @Andropov, and @Nycturne might have ideas as I know they’re more familiar with app development/graphics than I.

To be honest, my graphics knowledge is shaky. I dabbled back when the new hotness was S-buffering. Shortly before 3dfx, ATi and Nvidia rendered all of the effort building better CPU rasterizers mostly moot for realtime work. I have an interest, but mostly know just enough to be dangerous and put my foot in my mouth on the subject more than knowledgable. Partly why I lurk these threads more than participate.

That said, I don’t see how you get around RAM limitations for your target users. If your target users all have 8/16GB, then you gotta make stuff fit. Streaming of assets and geometry isn’t new, so obviously this is about the working set size. Ultimately, I don’t see Apple Silicon delivering a decisive advantage here, as the 2060, 3060 and 4060 come with around 8GB of VRAM. The differences in total memory usage needed when you have a single memory pool vs split memory pools is not something I’m terribly familiar with. Different usage profiles than the sort of stuff I deal with regularly. And to top it off, with development generally being a “common denominator” thing, you aren’t going to see a lot of development for outliers unless it’s someone’s pet project. So stuff is going to be built assuming the capabilities of what most people have (PC space) or what the current generation of consoles can do.

Honestly, I think the bigger deals are features in Metal that close the gap between it and what DirectX et al can do when it comes to games. Fast resource loading, MetalFX upscaling (especially important with high DPI displays Apple uses), etc.
 
I have to agree that while UMA has a lot of other advantages unless base ram increases dramatically game developers would struggle to take advantage of the really high effective VRAM available in some models - at least for users.

That said, I don’t see how you get around RAM limitations for your target users. If your target users all have 8/16GB, then you gotta make stuff fit. Streaming of assets and geometry isn’t new, so obviously this is about the working set size. Ultimately, I don’t see Apple Silicon delivering a decisive advantage here, as the 2060, 3060 and 4060 come with around 8GB of VRAM. The differences in total memory usage needed when you have a single memory pool vs split memory pools is not something I’m terribly familiar with. Different usage profiles than the sort of stuff I deal with regularly. And to top it off, with development generally being a “common denominator” thing, you aren’t going to see a lot of development for outliers unless it’s someone’s pet project. So stuff is going to be built assuming the capabilities of what most people have (PC space) or what the current generation of consoles can do.

I think Tim was pointing out two things: first, being able to work with assets larger than conventional GDDR capacities allow is unheard of. This means you can work more efficiently while creating your project — be it an app or a game or a video or whatever. It helps in developing games.
Second, it would allow you to do some cool stuff for users playing the game. Personally, I don’t understand this point about Unified Memory not providing an advantage, or I suppose a “decisive” advantage. I don’t really want to nitpick over words, but anyway. I’ve played some games, and usually they all come with specs you’re required to have and that you’re supposed to have. That really to have a playable experience with stuff, you ought to have these specs but it’s not required. There are also some games that plainly state you just can’t do it without X or Y graphics and CPU processors and RAM amounts. It’s in those types of games that I believe unified memory can and does offer something NVIDIA cannot. I feel like if this conversation is centered around what Apple can do to catch up to NVIDIA on X or Y Things, then I think this is one thing NVIDIA will always be behind in. Apple is stating to developers, if you develop a game on Mac OS X, you’ll be able to create games that aren’t possible on other platforms; in which the situation is now inverted, where before it was you can’t do it on Mac, now it’s you can’t do it on PC. Sure, practically speaking since Mac has a smaller user base than Windows at the moment, is a game developer going to develop a Mac version first? Maybe, maybe not. But Tim is trying to explain the benefits of what unified memory offers on the development side and the user side. Like there are plenty of games that aren’t built in mind with the lowest end NVIDIA card. There are plenty of games that state you need X card otherwise we won’t let you run the game. This same situation can be on Mac. The thing is, like bringing a console level game to iOS recently, you’re probably able to do a bit more even with the “lowest common denominator” specs. Keep in mind that spec is purely can and will change at any point. And we’re talking about a base spec that replaced the previous, where the previous couldn’t even launch certain games at all, whereas now they not only can launch it but play it. The higher end specs obviously letting people play at higher settings. But the MacBook Air can play certain games it literally could never before, so I’m not really sure the “lowest common denominator” thing holds much weight for me or the regular user, either.
 
I think Tim was pointing out two things: first, being able to work with assets larger than conventional GDDR capacities allow is unheard of. This means you can work more efficiently while creating your project — be it an app or a game or a video or whatever. It helps in developing games.
We all did specify that we were confused if he meant the advantage of extra vram for the users playing games - eg my addendum in your quoted part “at least for users”. It’s indeed possible that he meant just during the development process where that could be of use on even your personal development machine, especially for smaller/solo projects.

Second, it would allow you to do some cool stuff for users playing the game. Personally, I don’t understand this point about Unified Memory not providing an advantage, or I suppose a “decisive” advantage. I don’t really want to nitpick over words, but anyway. I’ve played some games, and usually they all come with specs you’re required to have and that you’re supposed to have. That really to have a playable experience with stuff, you ought to have these specs but it’s not required. There are also some games that plainly state you just can’t do it without X or Y graphics and CPU processors and RAM amounts. It’s in those types of games that I believe unified memory can and does offer something NVIDIA cannot. I feel like if this conversation is centered around what Apple can do to catch up to NVIDIA on X or Y Things, then I think this is one thing NVIDIA will always be behind in. Apple is stating to developers, if you develop a game on Mac OS X, you’ll be able to create games that aren’t possible on other platforms; in which the situation is now inverted, where before it was you can’t do it on Mac, now it’s you can’t do it on PC. Sure, practically speaking since Mac has a smaller user base than Windows at the moment, is a game developer going to develop a Mac version first? Maybe, maybe not. But Tim is trying to explain the benefits of what unified memory offers on the development side and the user side. Like there are plenty of games that aren’t built in mind with the lowest end NVIDIA card. There are plenty of games that state you need X card otherwise we won’t let you run the game. This same situation can be on Mac. The thing is, like bringing a console level game to iOS recently, you’re probably able to do a bit more even with the “lowest common denominator” specs. Keep in mind that spec is purely can and will change at any point. And we’re talking about a base spec that replaced the previous, where the previous couldn’t even launch certain games at all, whereas now they not only can launch it but play it. The higher end specs obviously letting people play at higher settings. But the MacBook Air can play certain games it literally could never before, so I’m not really sure the “lowest common denominator” thing holds much weight for me or the regular user, either.

Even for minimum and recommended specs there is a reasonable range of specs that can actually be accommodated. For instance 96GB of RAM is unlikely to be a viable recommended spec if your install base consists largely of 8GB users. The minimum is simply too far from the recommended - the performance on those minimum machines would be unpleasant if said performance is based on having a large pool of memory.

Finally, if your minimum is going to be 24GB or higher then your user base is too small to sell to. Assets are incredibly expensive, if you’re developing a game where 96 GB of vRAM can actually be taken advantage of that’s almost certainly a AAA game - in fact one the likes of which we haven’t seen. Those are incredibly expensive to make. They need a huge user pool to sell to. Selling to a fraction of a fraction of a market is not a viable business strategy for AAA gaming. Even though Mac user base is growing, it is still small and not every Mac user is a prospective gamer or interested in your game in particular. Now if you do as you suggest in the latter half, you’re adding further restrictions by eliminating the most popular Mac models by a substantial margin. Suddenly that’s not a lot of people to sell to. Even if you port the game to Windows with 4090s as minimum specs that’s a tiny fraction of people.

Now in 10 years, will the situation be the same? Hopefully not. But even in game development terms, predicting hardware capabilities and your own engine’s scalability can be incredibly hard. We’ve seen veteran studios get that very, very wrong, with disastrous consequences on launch. Cyberpunk 2077 is a good example where they got the scalability of their engine completely wrong and couldn’t deliver on the minimum specs (amongst other problems). Three years on and they’ve made good, of course hardware has improved, but they lost substantial amounts of money, time, and most importantly customer goodwill. They’re hardly the only ones.

In conclusion, it is hard to think of how end users benefit for gaming from huge vRAM memory pools. We’re not big enough to influence the development for rest of the market and engine scalability only goes so far as even amongst Mac users, those with such large memory pools are a minority.

Make no mistake: I believe UMA is a massive advantage for professional workloads and maybe one day for gaming. But that day looks fairly far out without some dramatic shifts in the market. We’ll have to see how successful the PC’s aping Apple are such as Nuvia and forthcoming ARM SOCs from Nvidia and AMD. And hopefully Apple increases base RAM in subsequent generations so that target minimum can be substantially higher. Again, though that’s a multi-year if not decade’s long process.

Edit: another important point brought up in the other thread about Mac gaming is that one of the main draws of developing games for Apple Silicon is access to the absolutely massive iOS market. If a game can be played with touch controls, and that can be a big if of course, the developers are going to be even more interested in how their game plays on the lowest end Apple hardware than the highest.
 
Last edited:
@dada_dave I think already did the heavy lifting here.

I think Tim was pointing out two things: first, being able to work with assets larger than conventional GDDR capacities allow is unheard of. This means you can work more efficiently while creating your project — be it an app or a game or a video or whatever. It helps in developing games.

Unless I’m building an asset that is being prepped “for the future”, making a giant asset that doesn’t fit on target machines kinda misses the point. It would need to be sized for where it gets used, not simply for where I develop it.

There is a decisive advantage in blender-style scenarios because I can fully take advantage of whatever hardware my team are willing to spend budget on. So I can create scenes for GPU render only bounded by system memory. Same with GPGPU compute where I can use data sets only bounded by system memory on whatever hardware I choose for the task.

Second, it would allow you to do some cool stuff for users playing the game. Personally, I don’t understand this point about Unified Memory not providing an advantage, or I suppose a “decisive” advantage. I don’t really want to nitpick over words, but anyway.

Examples?

Ultimately, the GPU having direct access to system memory gives me two specific advantages:

1) I am bounded by system memory, not a separate VRAM pool, for assets the GPU needs access to.
2) I can load assets into GPU-accessible memory without pushing the data across the PCIe bus as well.

In the case of 1, if my target hardware profile is 8-16GB of system memory, then I’m using some amount of that for the CPU side of the game, plus overhead for the OS. So it’s not clear to me that an 8GB low end Nvidia card is utterly trounced in that particular setup. I would very likely be targeting similar asset complexity in each case.

And it’s not like developers have had a problem filling the unified memory of the PS5, making me wonder what you think Apple’s doing differently to Sony?

In the case of 2, as a developer, I’m going to take advantage of this using Metal’s fast resource loading, much like I would use DirectStorage on Windows / Xbox (and Sony’s equivalent on PS5). But that low end Nvidia GPU has more bandwidth to VRAM than all but the full-fat M3 Max.

Like there are plenty of games that aren’t built in mind with the lowest end NVIDIA card. There are plenty of games that state you need X card otherwise we won’t let you run the game.

It seems like you are missing the point I was making here. I was pointing out that the low end cards of the last 3 generations include 8GB of VRAM. So it’s not like an asset that fits on an 8GB Air won’t fit on the most popular discrete GPUs available today. And because that VRAM is dedicated, the GPU has a bit more headroom than an 8GB Mac.

But I’ll bite: Examples? Note that games like Baldur’s Gate 3 and Cyberpunk 2077 which are at the top of the Steam top seller list both recommend a 2060.

The thing is, like bringing a console level game to iOS recently, you’re probably able to do a bit more even with the “lowest common denominator” specs. Keep in mind that spec is purely can and will change at any point. And we’re talking about a base spec that replaced the previous, where the previous couldn’t even launch certain games at all, whereas now they not only can launch it but play it. The higher end specs obviously letting people play at higher settings. But the MacBook Air can play certain games it literally could never before, so I’m not really sure the “lowest common denominator” thing holds much weight for me or the regular user, either.

In a world where large budget games are coming out on as many platforms as budget allows, something only one platform can do isn’t going to get a lot of attention, that’s what I meant by a common denominator (note I did not say “lowest”). Not without someone in a leadership position in the project pushing for it, or something that you simply cannot do elsewhere and that leadership person wants the feather in their cap for having done it first.

Smaller developers might bite, again if someone in a leadership position wants it.
 
I’m going to reply to both of you in one reply, because you’re both largely trying to say the same thing so I’m just going to address the first and make specific replies that aren’t covered by what I was trying to say earlier/now to the second reply.

Let us also keep in mind the original quote from Tim that Theorist9 was using to make whatever point was prefaced by this in the article:

“Millet also is unconvinced that the game dev universe has adapted to the unique architecture of the M-series chips quite yet, especially the unified memory pool.”

I think it’s weird to focus on that quote without this part. Anyways.



We all did specify that we were confused if he meant the advantage of extra vram for the users playing games - eg my addendum in your quoted part “at least for users”. It’s indeed possible that he meant just during the development process where that could be of use on even your personal development machine, especially for smaller/solo projects.
Tim is talking about development. Apple has repeatedly talked about that, so I can’t imagine it’s any different now. It also offers benefits to users of even the lowest cost Macs, so it works both ways, even if he did not specifically outline that in a particular interview.

Even for minimum and recommended specs there is a reasonable range of specs that can actually be accommodated. For instance 96GB of RAM is unlikely to be a viable recommended spec if your install base consists largely of 8GB users. The minimum is simply too far from the recommended - the performance on those minimum machines would be unpleasant if said performance is based on having a large pool of memory.
96 GB is cool, but is on the higher end of an extreme that is indeed possible but not something I was referring to. Additionally, the game doesn’t need to be designed to take advantage of a large amount of memory for users to benefit. It can just load more of the game/scene at higher res, etc. into unified memory, and users benefit.

If a game developer wants to go beyond and do something, that’s also now possible. As I stated, the difference between required v.s. Recommended. Developers can accommodate minimum specs for basic gameplay whilst unlocking cooler features for higher end machines, so I’m still unsure what the point here is. Game developers can and game developers often do both.

Finally, if your minimum is going to be 24GB or higher then your user base is too small to sell to. Assets are incredibly expensive, if you’re developing a game where 96 GB of vRAM can actually be taken advantage of that’s almost certainly a AAA game - in fact one the likes of which we haven’t seen. Those are incredibly expensive to make. They need a huge user pool to sell to. Selling to a fraction of a fraction of a market is not a viable business strategy for AAA gaming. Even though Mac user base is growing, it is still small and not every Mac user is a prospective gamer or interested in your game in particular. Now if you do as you suggest in the latter half, you’re adding further restrictions by eliminating the most popular Mac models by a substantial margin. Suddenly that’s not a lot of people to sell to. Even if you port the game to Windows with 4090s as minimum specs that’s a tiny fraction of people.
The thing here is, again, is about minimum vs recommended. You can enable gameplay for a large amount of people but enable features for others who have the better specs. Apple’s architecture enables better performance of some stuff inherently by letting developers load more stuff into unified memory, and that is by default the system just works like that. Sometimes, you can alter the game settings to specifically take advantage of X amount of memory in the system with specifying the amount of RAM you want.
But typically the game is just going to use unified memory with its benefits without additional work, so Mac users benefit in a way they did not before on the previous architecture, which is an upgrade to users.

The main point:

You need to keep in mind the most MacBook Pro’s ever had before was 8 GB of graphics memory. Tim explains this is no longer a weakness but now a strength of Mac, where you can literally get up to 128 GB of graphics memory in a Mac. Let’s keep in mind this is all the beginning? He’s trying to explain why Mac is a good platform to develop on and for. It already was, and now it’s better because it addressed a weakness imposed by AMD graphics chips.

Again, the previous 8 GB maximum is now 128 GB maximum on a MacBook. You keep talking about even if X amount of memory could be used how many Macs realistically could use it. The 8 GB I just referenced required a graphics card AMD 5600M on the top end 16 inch MacBook Pro that cost $800 to upgrade to, so $3200. Entry level MacBooks couldn’t even get more than 1.5 GB of graphics memory, because of Intel’s really crappy graphics chips.

So now you’ve gone from something like the MacBook Air from early 2020, which couldn’t even launch certain games, to now being able to play games on late 2020 with M1 and Apple silicon. The MacBook Air, entry level, went from the most basic, bare minimum graphics power to the dedicated 5300M available on the 16 inch MBP 2019, and went from zero dedicated graphics memory to 8 GB, which also beats out the 5300M’s entry level 4 GB. You beat out the 2019 16 inch MBP on CPU and matched it on its entry dedicated graphics, which cost $2,399, and beat it out on its dedicated memory amount, which cost $3200, with a 2020 MacBook Air that cost $999.

To even get 24 GB on NVIDIA top of the line GPU, you need to spend over $1,600. That doesn’t include, you know, the rest of the god damn computer lol. Even the Mac mini offers 24 GB of graphics memory at a price of $999. That includes everything, too.

So the point you’re missing is that Unified Memory now lets even the most basic, “lowest common denominator” specs compete on the level of one of the top of the line specs of the previous MBP.
That’s the baseline going forward. And Apple has obviously not stood still. On top of that, for future development, Apple is offering unique benefits to developers with the architecture.
That’s the point Tim and Apple at large is making.
So when I read stuff like this:

In conclusion, it is hard to think of how end users benefit for gaming from huge vRAM memory pools. We’re not big enough to influence the development for rest of the market and engine scalability only goes so far as even amongst Mac users, those with such large memory pools are a minority.

I just get plain confused. Unified memory in tandem with Apple’s chip designs just catapulted every base Mac from being good at basic stuff to being good at advanced stuff. The transition to Apple silicon has made it work like this: the top of the line comes to the lower end. The MacBook Air offers better performance than the MBP. The MBP offers better than Mac Pro performance. The Mac Pro offers unheard of performance on a Mac and pushes the boundaries of personal computing.

Unless I’m building an asset that is being prepped “for the future”, making a giant asset that doesn’t fit on target machines kinda misses the point. It would need to be sized for where it gets used, not simply for where I develop it.

Huh?
The ability to work with large assets straight out of memory is the stuff of pipe dreams for creators of anything, let alone games. No idea what you’re trying to push here, but it makes zero sense. Developers have already talked about how it benefits them previously. And even if you ignore developers, a theoretical example is being able to work on a CGI model for a sci-fi TV show entirely in memory. You’re able to create and edit details with that, and you won’t run out of memory like a traditional GPU that only offers 24GB maximum.

Examples?

The most basic being that the MacBook Air has 8 GB of memory standard and offers 5300 M performance standard? And that is with M1 7 core.

In the case of 1, if my target hardware profile is 8-16GB of system memory, then I’m using some amount of that for the CPU side of the game, plus overhead for the OS. So it’s not clear to me that an 8GB low end Nvidia card is utterly trounced in that particular setup. I would very likely be targeting similar asset complexity in each case.

I’m sorry, isn’t the point of this thread “What does Apple need to do to catch up to NVIDIA?” MacBooks have caught up to NVIDIA in a lot of areas but some stuff they will continue to improve in. Unified memory now lets MacBooks have more than 8 GB of graphics memory. Base Macs now match that level of memory offered by most dedicated GPUs and offer the advantage of going far beyond 24 GB offered by 4090.
I’m really confused here. It feels like I’m explaining Apple silicon to people who simultaneously know about it but have never heard of it. I’m so confused. The paradigm has changed here for the Mac and PC in general. It’s now Windows/Intel/AMD/NVIDIA who is behind, technically and philosophically. And considering I’m reading stuff like this:


“Apple's release of Arm-based system-on-chips for its desktops and laptops three years ago demonstrated that such processors could offer competitive performance and power consumption. On the Windows-based PC front, only Qualcomm has offered Arm-based SoCs for notebooks in recent years, but it looks like it will soon be joined by AMD and NVIDIA, two notable players in the PC space…

I’m going to presume it’s not just my opinion on that.


And it’s not like developers have had a problem filling the unified memory of the PS5, making me wonder what you think Apple’s doing differently to Sony?
What?

It seems like you are missing the point I was making here. I was pointing out that the low end cards of the last 3 generations include 8GB of VRAM. So it’s not like an asset that fits on an 8GB Air won’t fit on the most popular discrete GPUs available today. And because that VRAM is dedicated, the GPU has a bit more headroom than an 8GB Mac.

And it seems like you’re missing the point Apple is making. Even the MacBook Air now has 8 GB to work with. It didn’t before.
The 13 inch MBP only offered 1.5 GB.
The most ever offered on a MacBook was 8 GB of memory, and that required a $3200 machine.
Now you get this level of graphics memory standard on every Mac, and you get the advantage of working with up to 128 GB for both development and user’s use on a MacBook.

NVIDIA offers stuff Apple silicon still doesn’t.
Apple silicon offers stuff NVIDIA likely never will.
Apple is trying to explain their platform to developers, of any area, like any business would? I’m so genuinely confused why you guys are confused by this situation.

I’ve enjoyed this thread, and I don’t comment generally. I didn’t rush in and say “zomg Apple zilikon is betters!!! Fuk Nvidia.”

I enjoyed reading what people wrote, and I’ve read since the thread started. I only chimed in to explain unified memory’s benefits, and on top of that only did that because I felt confused why you guys are confused.

If you guys are trying to claim that unified memory doesn’t offer advantages to the current paradigm used by NVIDIA, I’m just going to end it here on my part. If NVIDIA offered this kind of stuff, I don’t feel like the replies would be the same.
 
Last edited:
The thing here is, again, is about minimum vs recommended. You can enable gameplay for a large amount of people but enable features for others who have the better specs. Apple’s architecture enables better performance of some stuff inherently by letting developers load more stuff into unified memory, and that is by default the system just works like that. Sometimes, you can alter the game settings to specifically take advantage of X amount of memory in the system with specifying the amount of RAM you want.

What features though?

But typically the game is just going to use unified memory with its benefits without additional work, so Mac users benefit in a way they did not before on the previous architecture, which is an upgrade to users.

Leman is going to have to jump in on this, but I recall him pointing out to me at one point that there are still costs if Metal resources aren't marked as sharable with the GPU which can incur costs of making copies in those cases. So I'm skeptical this claim holds.

It already was, and now it’s better because it addressed a weakness imposed by AMD graphics chips.

Well, and Apple's stubborn behavior regarding Nvidia.

I just get plain confused. Unified memory in tandem with Apple’s chip designs just catapulted every base Mac from being good at basic stuff to being good at advanced stuff. The transition to Apple silicon has made it work like this: the top of the line comes to the lower end. The MacBook Air offers better performance than the MBP. The MBP offers better than Mac Pro performance. The Mac Pro offers unheard of performance on a Mac and pushes the boundaries of personal computing.

Unified memory didn't do that though. Switching to a CPU architecture that doesn't rely on TurboBoost gulping down the watts is huge. Going from what's always been considered a joke of an iGPU to a GPU design that is competitive with dedicated graphics cards and can ramp down in power when it's not needed is huge.

Unified memory made the VRAM pool competitive when it had been behind (at Apple's asking prices) for years. Hell, I'm just glad I don't have to go through the hell that I did back in 2019 trying to setup a Mac Mini with an eGPU to get decent performance. But that hell was of Apple's own making.

Huh?
The ability to work with large assets straight out of memory is the stuff of pipe dreams for creators of anything, let alone games. No idea what you’re trying to push here, but it makes zero sense. Developers have already talked about how it benefits them previously. And even if you ignore developers, a theoretical example is being able to work on a CGI model for a sci-fi TV show entirely in memory. You’re able to create and edit details with that, and you won’t run out of memory like a traditional GPU that only offers 24GB maximum.

So you clip out the very next paragraph where I talk about creators to tell me about creators? Cool debate trick. It might help if you point out the specific developers and scenarios rather than saying "it helps". It helps if I have something to put into the extra memory. But if my assets are already sized for smaller memory pools, I just get more room for overhead in terms of tooling I can run in parallel, but that in many cases doesn't mean more memory is being used for graphics memory. And if I'm working with assets that won't be used in the final builds, it's more about experimentation and R&D for the future than delivering a product today.

Keep in mind there are developers in this thread.

The most basic being that the MacBook Air has 8 GB of memory standard and offers 5300 M performance standard? And that is with M1 7 core.

So it means the Air has caught up with low end discrete GPUs in terms of VRAM. Catching up isn't an advantage.

I’m sorry, isn’t the point of this thread “What does Apple need to do to catch up to NVIDIA?” MacBooks have caught up to NVIDIA in a lot of areas but some stuff they will continue to improve in. Unified memory now lets MacBooks have more than 8 GB of graphics memory. Base Macs now match that level of memory offered by most dedicated GPUs and offer the advantage of going far beyond 24 GB offered by 4090.

That is the topic of the thread yes, but you made specific claims that we challenged.

And keep in mind that the 4090 is slotted as much a creator card as it is a gamer card. It replaced the Titan which was the "consumer grade" creator card in the lineup.

I’m really confused here. It feels like I’m explaining Apple silicon to people who simultaneously know about it but have never heard of it. I’m so confused. The paradigm has changed here for the Mac and PC in general. It’s now Windows/Intel/AMD/NVIDIA who is behind, technically and philosophically. And considering I’m reading stuff like this:


“Apple's release of Arm-based system-on-chips for its desktops and laptops three years ago demonstrated that such processors could offer competitive performance and power consumption. On the Windows-based PC front, only Qualcomm has offered Arm-based SoCs for notebooks in recent years, but it looks like it will soon be joined by AMD and NVIDIA, two notable players in the PC space…

I’m going to presume it’s not just my opinion on that.

1) If you are the one left confused in a thread, maybe take a step back.

2) Some see Apple's move as a breakthrough, and my Apple Silicon Mac is the best dev machine I've ever used. However, I don't think this one thing provides as strong an advantage as you seem to think. Not enough to put everyone else on the back foot per se. But it is enough to put them on watch that Apple is serious and they cannot stagnate now of all times, as Apple has made some strong moves to catch up and won't be standing still.

3) The quote is commenting that ARM designs have caught up, and is as much about the CPU as the GPU. Nvidia being interested doesn't surprise me as they've tried the SoC space before and are still heavily focused in server farms using ARM and their GPU compute. But it's clear that the opportunity is in a higher end Tegra-like SoC that could run Windows on a Laptop for Nvidia where they can claw in more of the revenue from the laptop than they could before, shutting out Intel and/or AMD. If they are still working with Nintendo, keep an eye on what they do there in the coming year or two.


You are aware that the PS5 has 16GB of unified system memory, right? Intel chips use unified system memory for graphics and has the ability to move pages to the GPU without copies? The main issue with Nvidia is them not producing an SoC to share memory with in the first place.

Also, on the topic of the 1.5GB limit, keep in mind that Apple had control over that. The Intel chips could go higher, but Apple needed to implement the management for it in the OS. Another thing that Leman can probably comment on with more authority than I can. My memory isn't perfect here, so I don't remember if Apple ever implemented it or not. But Windows these days can use the whole system memory for VRAM through dynamic allocation on modern Intel iGPUs. It's just that Intel's GPU design isn't competitive so it's not a huge benefit to have a lot of VRAM if it's dog slow.

Now you get this level of graphics memory standard on every Mac, and you get the advantage of working with up to 128 GB for both development and user’s use on a MacBook.

It's available, yes. Nobody is denying that part. It's more that we're trying to point out that the use cases where someone's running around with 128GB aren't as every-day. For certain use cases it will be great once tools take advantage of it. But those aren't likely to be games.

If you guys are trying to claim that unified memory doesn’t offer advantages to the current paradigm used by NVIDIA, I’m just going to end it here on my part. If NVIDIA offered this kind of stuff, I don’t feel like the replies would be the same.

There are certain advantages, yes. What I'm saying is that it's not quite groundbreaking in the realm of games as you seem to be claiming.
 
Change their priorities.

I don't think Apple is playing Nvidia's game - Nvidia don't really make anything competitive in the same thermal/power envelope of CPU+GPU.

I don't for one second believe that Apple are "not capable" of building something competitive with Nvidia at the high end, it just isn't really their market.

Maybe that will change if we see a proper Mac Pro released. I'm entirely unconvinced by the current Mac Pro - it very much looks (to me) like it was supposed to have a second Ultra SOC on that platform, maybe something went sideways during development and it didn't work as expected. You can even see the empty space on the board for it if you squint enough.

Side note: in terms of the M series unified memory being a win: an Nvidia engineer put out a video a year or 3 back which essentially stated that (in terms of energy) "memory bandwidth is expensive, compute is essentially free". It's not "free" compute obviously, but the big energy cost is moving data around these days. So anything you can do to stop doing that will be a win. Not copying from DRAM to VRAM across a PCIe bus is obviously going to help with that. Both in terms of energy cost and latency if the data is not in the correct memory pool when required.
 
Last edited:
I never understood this, precisely for the reasons you mention. Why would he talk about 96GB VRAM for gaming if the majority of hardware they sell still has only 8GB? I can imagine how having a lot of RAM could be great for building huge open worlds with unique textures and other nice things, but that's not relevant to the majority of Macs out there.
I can only think of two ways this could make sense, but I don't know enough about games to know if either is possible:

1) Millet was referring to the value of large VRAM for game development specifically, as opposed to game play. Is more VRAM needed (or useful) for development than play?
2) Games can be coded to scale in their VRAM demand, such that they can play fine with low VRAM, but better with high VRAM.

Interestingly, VRAM size has become a hot-button topic among PC gamers. The articles linked below report that, at the higher-quality settings, some modern games require more than 8 GB—thus you can run into stuttering with cards like the 3070, which is relatively strong computationally, but has only 8 GB VRAM.

Thus it seems plausible (if not possible) that you could have a game that plays fine at lower quality settings on an 8 GB Mac, but at the same time can leverage unusually high UM (32 GB+) to implement more detailed textures, shading, and other features not possible on PC video cards limited to <=16 GB VRAM (as most consumer cards are)—unless such settings require more computational power than an M3 Pro or M3 Max can provide, thus preventing the Mac's GPU from getting to the point where it could use such high amounts of UM.



And here's a 43-page thread about this on guru3d.com. While I can't assess who's right and wrong, this does indicate it's a topic of some interest to the PC gaming community.
 
Last edited:
You know I was half joking in the other thread that Apple needs to improve communications to catch up to Nvidia:

Yeah I should’ve included that for why it’s even more true with Apple Silicon and Apple products overall. Apple’s penchant for general secrecy and vagueness about its products. You could almost put “communications” in your list of things Apple can do to catch Nvidia. That’s griping I’ve read from professionals using Apple products for years and years, basically back to the old days.

But upon reading @leman ‘s post on Metal shader and further reflection it’s actually true. Nvidia expends a lot of resources on making their APIs not just easy to use but teaching people how to use them with lots of tutorials, well written documentation, and classes. Someone once quipped that Nvidia, especially by head count, is really a software company masquerading as a hardware company (I believe that was Ryan Smith at Anandtech). New developer experience is crucial and given the apparent sorry state of Apple’s developer resources improving those could pay huge dividends.
 
Change their priorities.

I don't think Apple is playing Nvidia's game - Nvidia don't really make anything competitive in the same thermal/power envelope of CPU+GPU.

I don't for one second believe that Apple are "not capable" of building something competitive with Nvidia at the high end, it just isn't really their market.

Maybe that will change if we see a proper Mac Pro released. I'm entirely unconvinced by the current Mac Pro - it very much looks (to me) like it was supposed to have a second Ultra SOC on that platform, maybe something went sideways during development and it didn't work as expected. You can even see the empty space on the board for it if you squint enough.
So there are a couple of different ways to interpret the meaning of catch up. I think primarily @Jimmyjames was interested in terms of what capabilities Nvidia GPUs posses that Apple would do well to … borrow. And going through the posts you can see there are still a number areas where Apple, just like with ray tracing prior to the M3, could improve their offerings.

For raw performance, especially at the high end, yes Apple could add an extreme option which would close the gap. The problem is of course such an option would be quite expensive (though as aforementioned the high RAM availability would make such a GPU competitive with Nvidia’s workstation GPU offerings which are even more expensive).

One of the issues Apple faces however is that, while their offerings are indeed competitive in the laptop market given the thermal constraints of a mobile device, their desktop performance lags behind except in those aforementioned professional workloads where large vRAM pools can really shine. Now arguably the laptop market is more crucial to get right, it is by far the larger market. But even so Apple’s offerings for desktops tend to be more expensive for less performance than what you could get from an Nvidia or even AMD GPU. This obviously is different from the CPU side of things where their processors are not only more efficient at low power envelops than their x86 counterparts but just as if not more performant at the high end as well.

For the GPU though while Apple arguably has some advantages in terms of their TBDR design for raster performance, unified memory, and often being on more advanced process nodes, it’s nowhere near as dominant. And part of that is simply intrinsic to GPUs, wide computational bandwidth simply requires lots of resources be it silicon devoted to more cores or faster clocks or memory bandwidth or … etc … Basically, yes, Nvidia GPUs especially require a lot of power but that power translates into performance gains in a way that x86 CPU designers struggle with because those designs are already so far out on the power to performance curves. My guess is that, details aside, in the broad strokes Apple could probably run higher clocks throughout their GPU design and get linear performance increases to match. Now those details may stop those from becoming a reality. I know @Cmaier and @mr_roboto have opined on why transistors and circuits may not always be amenable to just pumping more power in to them, especially if Apple has prioritized their designs for low power envelopes. And we don’t know the behavior of Apple’s fairly unique LPDDR extra-wide memory controller at different clock speeds. But one suspects that overall Apple has a lot of potential thermal headroom here for desktop computers.

And to reiterate a point I made earlier: one issue with their unified SOC design is the linkage between high CPU performance and high GPU performance, which is fine for a lot people, but for some people they’re forced to get both even if they want a computer to specialize in one or the other. That raises the apparent cost to said buyer and that can exacerbate the cost issue for the desktop GPU where Apple struggles with performance per dollar already.
Side note: in terms of the M series unified memory being a win: an Nvidia engineer put out a video a year or 3 back which essentially stated that (in terms of energy) "memory bandwidth is expensive, compute is essentially free". It's not "free" compute obviously, but the big energy cost is moving data around these days. So anything you can do to stop doing that will be a win. Not copying from DRAM to VRAM across a PCIe bus is obviously going to help with that. Both in terms of energy cost and latency if the data is not in the correct memory pool when required.
Definitely true and unified memory is a win here. Though in terms of GPUs and data movement the engineer was almost certainly referring to more than just that: It’s all about getting the data to the cores and getting the data to the GPU is just the first step of that. That’s kind of what @tomO2013 ‘s earlier next platform link about LLM and cache design and @leman ’s reply is really about. The massive energy savings of the ASIC design relative to the GPU comes from having all the data and weights present so close to cores. I should note this another area where Apple’s new dynamic cache is really important. It’s not a “you can forgo RAM”-level savings, but that kind of large flexible cache structure can be really nice. Of course we don’t really know it’s full properties yet, @leman has been poking at it, and again Apple could maybe do with some extra documentation …
 
Last edited:
I can only think of two ways this could make sense, but I don't know enough about games to know if either is possible:

1) Millet was referring to the value of large VRAM for game development specifically, as opposed to game play. Is more VRAM needed (or useful) for development than play?
2) Games can be coded to scale in their VRAM demand, such that they can play fine with low VRAM, but better with high VRAM.

Interestingly, VRAM size has become a hot-button topic among PC gamers. The articles linked below report that, at the higher-quality settings, some modern games require more than 8 GB—thus you can run into stuttering with cards like the 3070, which is relatively strong computationally, but has only 8 GB VRAM.

Thus it seems plausible (if not possible) that you could have a game that plays fine at lower quality settings on an 8 GB Mac, but at the same time can leverage unusually high UM (32 GB+) to implement more detailed textures, shading, and other features not possible on PC video cards limited to <=16 GB VRAM (as most consumer cards are)—unless such settings require more computational power than an M3 Pro or M3 Max can provide, thus preventing the Mac's GPU from getting to the point where it could use such high amounts of UM.



And here's a 43-page thread about this on guru3d.com. While I can't assess who's right and wrong, this does indicate it's a topic of some interest to the PC gaming community.
Much like with Apple’s base M3 on the Macbook Pro coming with 8GB (which full disclosure my wife and I bought to replace her aging MacBook though that was a point of pain), with the 4000 series Nvidia took a lot of flak for their low base VRAM and slow VRAM increases for product tiers (as well as low memory bandwidth on some of those tiers and overall price increases). So this is very definitely an issue in the PC space.

The problem with 1) is that the implications for developers during development are fairly obvious: if you don’t have access to workstation GPUs then a personal dev machine with huge VRAM can be nice for a lot of different tasks, especially rendering. No need to really scratch your head on that. :) The problem with 2) is that the quote wasn’t 8 to 36GB but 8 to 96GB. And your question about whether the Max is big enough computationally to take advantage (for gaming) of being able to store all these great assets at 36GB is a really good one. It probably depends. But it’s another reason why Tim’s original statement is somewhat questionable.
 
Last edited:
My guess is that, details aside, in the broad strokes Apple could probably run higher clocks throughout their GPU design and get linear performance increases to match. Now those details may stop those from becoming a reality. I know @Cmaier and @mr_roboto have opined on why transistors and circuits may not always be amenable to just pumping more power in to them, especially if Apple has prioritized their designs for low power envelopes. But one suspects that overall Apple has a lot of potential thermal headroom here for desktop computers.

M3 Max GPU power consumption is weird... when I run my sorting code that can saturate the GPU bandwidth the reported power consumption for the 30-core model is 10-12 watts... for a max throughput back-to-back FP32 or INT FMA, 10 watts. But if I use FP16 instead of FP32, it jumps to 45 watts. Doesn't make any sense to me. I guess there is a bug in reporting the power consumption. P.S. The performance is as expected.
 
But upon reading @leman ‘s post on Metal shader and further reflection it’s actually true. Nvidia expends a lot of resources on making their APIs not just easy to use but teaching people how to use them with lots of tutorials, well written documentation, and classes. Someone once quipped that Nvidia, especially by head count, is really a software company masquerading as a hardware company (I believe that was Ryan Smith at Anandtech). New developer experience is crucial and given the apparent sorry state of Apple’s developer resources improving those could pay huge dividends.

While Ballmer was mocked for his whole "developers, developers, developers, developers" bit, he wasn't wrong. If you are building a platform, the developers are everything. They are your VIP users.
 
M3 Max GPU power consumption is weird... when I run my sorting code that can saturate the GPU bandwidth the reported power consumption for the 30-core model is 10-12 watts... for a max throughput back-to-back FP32 or INT FMA, 10 watts. But if I use FP16 instead of FP32, it jumps to 45 watts. Doesn't make any sense to me. I guess there is a bug in reporting the power consumption. P.S. The performance is as expected.
Can you check that with a wall meter, or does the battery interfere too much? Specifically, if you charged it to 100%, and the charge stayed at at 100% during testing, could you assume you'd sufficiently taken the battery out of the picture to check if there was a 35-watt increase? [Especially if you could make the test go on for several minutes.]
 
Can you check that with a wall meter, or does the battery interfere too much? Specifically, if you charged it to 100%, and the charge stayed at at 100% during testing, could you assume you'd sufficiently taken the battery out of the picture to check if there was a 35-watt increase? [Especially if you could make the test go on for several minutes.]
You can also use ioreg to get the system power usage. I have some command line tool that lets you get the battery power.

You can see the tools here: m1battery stats. You also need to install bitwise which is my hacked version of a command line calculator.
 
You can also use ioreg to get the system power usage. I have some command line tool that lets you get the battery power.

You can see the tools here: m1battery stats. You also need to install bitwise which is my hacked version of a command line calculator.

I'm getting an error "bitwise: invalid option -- o", any ideas? Installed with homebrew
 
Though in terms of GPUs and data movement the engineer was almost certainly referring to more than just that: It’s all about getting the data to the cores and getting the data to the GPU is just the first step of that.

Confirming that yes: he was discussing that as well - bandwidth within the core itself and how to cut down consumption of such. Different pools of memory are exponentially worse than that though :D

I mean direct attached, discrete memory is great for performance - so long as your data set actually fits inside that memory pool. You're still dealing with the latency of getting data in and out of it though and the larger each pool of memory becomes the more waste there is in terms of power, bandwidth, space, heat and basically paying for way more RAM than you need due to storing various data in both pools for access by both the CPU and GPU (and keeping it synchronised).
 
I'm getting an error "bitwise: invalid option -- o", any ideas? Installed with homebrew
I’d have to look into it. It’s been a while.

Edit: Yeah you need to use my hacked version. Bitwise is nice code but they don’t follow normal UNIX conventions. Mine does.
 
Back
Top