M4 Mac Announcements

It’s a few days after release, so I thought it would be good to look at some of the scores for Cyberpunk 2077 to see how things compare with Windows and also GPTK. Except….

There seems to be a bug/misconfiguration in the Mac version of the game. I first read about it on Reddit. It seems that whatever preset is chosen (except the “For this Mac” ones), the setting for Screen Space Reflections (SSR) is +1 on the Mac. So the Ultra preset has a SSR setting of Ultra on Windows, but “Psycho” on macOS. Whoops! Unfortunately Ultra -> Psycho is massive drop in performance. I tried a variety of resolution at ultra both with and without the correct setting for SSR.


SettingSSR = “Psycho"SSR = UltraPercentage change.
4K Ultra9.1514.0453%
1440p Ultra21.0730.8747%
1080p Ultra37.0349.9435%
720p Ultra 70.2185.3122%

These figures are from a M4 Pro (20 core gpu). I think we can see that if not for this bug, early performance reviews would be much better. I don’t have a M4 Max, but I would imagine it would perform like a 4070m, rather than around a 4060m as it currently does. Not bad I suppose, not great either.

Ray tracing is another matter. The performance is bad. This above bug exists in RT presets as well. In fact all RT presets have SSR set at “Psycho”. My machine is too slow to notice a difference with or without an incorrect SSR setting.

I am very curious why the M3/M4 machines do pretty well on Blender/Cinebench R24, but so poorly on Assassins Creed Shadows and Cyberpunk 2077 in terms of RT. Can it just be optimization? Apple helped quite a bit with this port, so that seems unlikely.
Hitting a 4070m’s performance isn’t bad, but ideally the M4 Max should perform around a 4080m (and in Blender the full M4 Max is better than the 4080m). Pity ray tracing performance is even worse.

As for why games perform worse than Blender … I dunno. The following is pure conjecture: Maybe it has to do with the fact many AAA games tend to break graphics APIs for pure performance and game specific drivers are needed to enable those performance hacks. If Apple doesn’t do that or enable those kinds of hacks, then performance suffers.

Hardware isn’t everything, drivers are often the secret sauce of performance. It has to be noted Blender is an outlier amongst renderers the other way. But that’s probably software optimization, the other key ingredient as you pointed out.
 
It’s a few days after release, so I thought it would be good to look at some of the scores for Cyberpunk 2077 to see how things compare with Windows and also GPTK. Except….

There seems to be a bug/misconfiguration in the Mac version of the game. I first read about it on Reddit. It seems that whatever preset is chosen (except the “For this Mac” ones), the setting for Screen Space Reflections (SSR) is +1 on the Mac. So the Ultra preset has a SSR setting of Ultra on Windows, but “Psycho” on macOS. Whoops! Unfortunately Ultra -> Psycho is massive drop in performance. I tried a variety of resolution at ultra both with and without the correct setting for SSR.


SettingSSR = “Psycho"SSR = UltraPercentage change.
4K Ultra9.1514.0453%
1440p Ultra21.0730.8747%
1080p Ultra37.0349.9435%
720p Ultra 70.2185.3122%

These figures are from a M4 Pro (20 core gpu). I think we can see that if not for this bug, early performance reviews would be much better. I don’t have a M4 Max, but I would imagine it would perform like a 4070m, rather than around a 4060m as it currently does. Not bad I suppose, not great either.

Ray tracing is another matter. The performance is bad. This above bug exists in RT presets as well. In fact all RT presets have SSR set at “Psycho”. My machine is too slow to notice a difference with or without an incorrect SSR setting.

I am very curious why the M3/M4 machines do pretty well on Blender/Cinebench R24, but so poorly on Assassins Creed Shadows and Cyberpunk 2077 in terms of RT. Can it just be optimization? Apple helped quite a bit with this port, so that seems unlikely.
Is the SSR setting actually used when you turn RT on?
 
Hitting a 4070m’s performance isn’t bad, but ideally the M4 Max should perform around a 4080m (and in Blender the full M4 Max is better than the 4080m). Pity ray tracing performance is even worse.

As for why games perform worse than Blender … I dunno. The following is pure conjecture: Maybe it has to do with the fact many AAA games tend to break graphics APIs for pure performance and game specific drivers are needed to enable those performance hacks. If Apple doesn’t do that or enable those kinds of hacks, then performance suffers.
I think you’ve hit the nail on the head tbh. I don’t think Apple will ship game specific driver “optimizations”, and I don’t think I’d want them to. The price for that may be lower performance.
Hardware isn’t everything, drivers are often the secret sauce of performance. It has to be noted Blender is an outlier amongst renderers the other way. But that’s probably software optimization, the other key ingredient as you pointed out.
Yes.
 
Hitting a 4070m’s performance isn’t bad, but ideally the M4 Max should perform around a 4080m (and in Blender the full M4 Max is better than the 4080m). Pity ray tracing performance is even worse.

As for why games perform worse than Blender … I dunno. The following is pure conjecture: Maybe it has to do with the fact many AAA games tend to break graphics APIs for pure performance and game specific drivers are needed to enable those performance hacks. If Apple doesn’t do that or enable those kinds of hacks, then performance suffers.

Hardware isn’t everything, drivers are often the secret sauce of performance. It has to be noted Blender is an outlier amongst renderers the other way. But that’s probably software optimization, the other key ingredient as you pointed out.
The big break from the past in all the modern 3D APIs like Vulkan, Metal, and DX12 is that it's not really possible for drivers to be the secret sauce they used to be. There's far less impedance mismatch between the abstract model of the hardware exposed by the API and the hardware itself; the drivers become a fairly thin layer that adds very little overhead.

This puts much of the responsibility for performance tuning directly on the game developer. When you target Metal and Apple GPUs you may need to do some things somewhat differently in your engine to get the most out of TBDR rendering hardware - mostly, arranging your render pipeline and draw calls to minimize the number of tile RAM loads and stores. (nb: I might have worded all this awkwardly, this stuff isn't my area of expertise)

A lot hinges on how much of a free hand the people who did the CP2077 port were given with respect to restructuring the game's rendering engine to better match TBDR. This kind of thing likely also affects RT, though I'm not sure by how much.
 
This puts much of the responsibility for performance tuning directly on the game developer. When you target Metal and Apple GPUs you may need to do some things somewhat differently in your engine to get the most out of TBDR rendering hardware - mostly, arranging your render pipeline and draw calls to minimize the number of tile RAM loads and stores. (nb: I might have worded all this awkwardly, this stuff isn't my area of expertise)

I don't think this is worded awkwardly at all.

The only thing I'd add is that because there's also been consolidation of game engines, sometimes tuning isn't even something a developer can achieve to the level needed/desired unless they have the engine developer's ear. In the case of CP2077, since it uses Unreal (according to a web search), there's limits to what the folks working on the port can accomplish if Unreal itself isn't setup to enable the optimizations needed.
 
The big break from the past in all the modern 3D APIs like Vulkan, Metal, and DX12 is that it's not really possible for drivers to be the secret sauce they used to be. There's far less impedance mismatch between the abstract model of the hardware exposed by the API and the hardware itself; the drivers become a fairly thin layer that adds very little overhead.

This puts much of the responsibility for performance tuning directly on the game developer. When you target Metal and Apple GPUs you may need to do some things somewhat differently in your engine to get the most out of TBDR rendering hardware - mostly, arranging your render pipeline and draw calls to minimize the number of tile RAM loads and stores. (nb: I might have worded all this awkwardly, this stuff isn't my area of expertise)

A lot hinges on how much of a free hand the people who did the CP2077 port were given with respect to restructuring the game's rendering engine to better match TBDR. This kind of thing likely also affects RT, though I'm not sure by how much.
Not an expert at all. Nat Brown (who was a manager in the gametech team until last year) commented that the port had significant involvement from Apple directly and that quite a bit of effort had gone into optimising not only for Apple Silicon, but for each gen of M series gpu.

Obviously it’s possible that there will be significant improvements when Metal 4 arrives. It might also not improve things much at all!
 
I don't think this is worded awkwardly at all.

The only thing I'd add is that because there's also been consolidation of game engines, sometimes tuning isn't even something a developer can achieve to the level needed/desired unless they have the engine developer's ear. In the case of CP2077, since it uses Unreal (according to a web search), there's limits to what the folks working on the port can accomplish if Unreal itself isn't setup to enable the optimizations needed.
I think Cyberpunk uses Redengine. They are switching to Unreal foe future games.
 
The big break from the past in all the modern 3D APIs like Vulkan, Metal, and DX12 is that it's not really possible for drivers to be the secret sauce they used to be. There's far less impedance mismatch between the abstract model of the hardware exposed by the API and the hardware itself; the drivers become a fairly thin layer that adds very little overhead.

This puts much of the responsibility for performance tuning directly on the game developer. When you target Metal and Apple GPUs you may need to do some things somewhat differently in your engine to get the most out of TBDR rendering hardware - mostly, arranging your render pipeline and draw calls to minimize the number of tile RAM loads and stores. (nb: I might have worded all this awkwardly, this stuff isn't my area of expertise)

A lot hinges on how much of a free hand the people who did the CP2077 port were given with respect to restructuring the game's rendering engine to better match TBDR. This kind of thing likely also affects RT, though I'm not sure by how much.
I still see day 0 game driver patches and often drivers are still blamed for poor game performance/stability where later driver patches improve things significantly, but maybe lots of (even new) games are still DX11? Or there is still a little wiggle room even in the newer APIs?
 
I still see day 0 game driver patches and often drivers are still blamed for poor game performance/stability where later driver patches improve things significantly, but maybe lots of (even new) games are still DX11? Or there is still a little wiggle room even in the newer APIs?
It is a thing even for DX12 games. Metal has the advantage that the API is written by the hardware owner and thus doesn't have to have this intermediate step to improve performance or fix bugs (looking at you broken photomode in CP2077 while using Path Tracing).
 
The big break from the past in all the modern 3D APIs like Vulkan, Metal, and DX12 is that it's not really possible for drivers to be the secret sauce they used to be. There's far less impedance mismatch between the abstract model of the hardware exposed by the API and the hardware itself; the drivers become a fairly thin layer that adds very little overhead.

This puts much of the responsibility for performance tuning directly on the game developer. When you target Metal and Apple GPUs you may need to do some things somewhat differently in your engine to get the most out of TBDR rendering hardware - mostly, arranging your render pipeline and draw calls to minimize the number of tile RAM loads and stores. (nb: I might have worded all this awkwardly, this stuff isn't my area of expertise)

A lot hinges on how much of a free hand the people who did the CP2077 port were given with respect to restructuring the game's rendering engine to better match TBDR. This kind of thing likely also affects RT, though I'm not sure by how much.
While true, there can still be massive gains from drivers. Hardware Unboxed did a great video on the performance wins driver updates gave the RX 7xxx series of AMD cards - Off the top of my mind it was 8% on average, but in some cases it was like 22%. And then of course there's Intel drivers where some games went from like 11FPS to 50FPS with a driver update - granted that mostly happened in DX11 titles, but it also did affect a few DX12 titles.

And while others are correct that Apple can tailor Metal specifically to their architectural needs unlike Vulkan and DX12 that need to generalise things a bit (extensions not withstanding), Metal still does need to generalise across generations and abstract for future opportunities.
 
Back
Top