3D games working on Apple GPU on Linux at 4K!

In interest of balance on my earlier, negative posts about the unexpected costs of Linux development, here are a couple of posts from Hector about how Linux can be an improvement over macOS on the same hardware:



In short: with many more people working on kernel improvements and no requirement for a stable kernel API, Linux can optimize faster and remove legacy cruft sooner. Thus Linux can be faster for the same tasks on the same hardware even though macOS is optimized for that hardware. Hector does caveat this that of course macOS still has more features enabled than they do currently, but they are catching up and, again, for certain tasks Linux will likely be faster/smoother.
I'd quibble a bit with Hector about why macOS is slower than Linux when you ask each to do tons of small file I/Os per second. (and btw, this is not at all a new thing with Apple Silicon, people used to do similar macOS vs Linux measurements on x86 Macs, with very similar results.) IMO, it's due to three factors:

1. Linus Torvalds cares. A lot! He uses his computers to do exactly the things Hector talked about: git SCM operations on Linux kernel source trees, unpacking kernel source tarballs, compiling kernels. He's a happy man when anyone sends him a patch improving Linux VFS performance, because he will see its benefits in a very direct way.

2. There's plenty of people feeding Torvalds such patches, as many of the Linux kernel's corporate patrons care too. Some might surprise you - Facebook, pre-Elon Twitter. Turns out that if you're a social media giant, hiring top kernel devs so you can encourage them to work on performance in key areas can save millions of dollars in operating costs. Small-file IO is important to many server workloads...

3. Most Mac customers who care about file IO performance only care about large streaming file IO (video editing and the like). Apple doesn't sell servers either. And most devs who work on Macs aren't interacting with SCM databases as large and complicated as the Linux git tree. So... Apple just doesn't put much effort into small-file random IO. Would they benefit from Linux-like performance here anyways? Possibly, but I think it's clear that management either doesn't understand that or has consciously chosen to rank it as such a low priority that it never gets done.
 
I'd quibble a bit with Hector about why macOS is slower than Linux when you ask each to do tons of small file I/Os per second. (and btw, this is not at all a new thing with Apple Silicon, people used to do similar macOS vs Linux measurements on x86 Macs, with very similar results.) IMO, it's due to three factors:

1. Linus Torvalds cares. A lot! He uses his computers to do exactly the things Hector talked about: git SCM operations on Linux kernel source trees, unpacking kernel source tarballs, compiling kernels. He's a happy man when anyone sends him a patch improving Linux VFS performance, because he will see its benefits in a very direct way.

2. There's plenty of people feeding Torvalds such patches, as many of the Linux kernel's corporate patrons care too. Some might surprise you - Facebook, pre-Elon Twitter. Turns out that if you're a social media giant, hiring top kernel devs so you can encourage them to work on performance in key areas can save millions of dollars in operating costs. Small-file IO is important to many server workloads...

3. Most Mac customers who care about file IO performance only care about large streaming file IO (video editing and the like). Apple doesn't sell servers either. And most devs who work on Macs aren't interacting with SCM databases as large and complicated as the Linux git tree. So... Apple just doesn't put much effort into small-file random IO. Would they benefit from Linux-like performance here anyways? Possibly, but I think it's clear that management either doesn't understand that or has consciously chosen to rank it as such a low priority that it never gets done.

There is another, much more prosaic reason. MacOS does extensive tracking and recording of filesystem events (in addition to other dozens of services). Bare-bones Linux, on which these tests are usually performed, has only minimal services configured. A few years ago I remember doing a little experiment, and using a filesystem event watcher on Linux brings the I/O performance to the same ballpark as APFS.
 
Other reason macOS is slower than Linux: it does a heap more in the background.

iCloud related stuff, like sync, continuity, airdrop, shared copy paste etc don’t come for free. Linux doesn’t do a lot of things macOS does in the background.

Edit: exactly as per @leman above.
 
Tangentially related to Asahi Linux, here's another gaming development, which coincidentally happened right before WWDC. CodeWeavers has announced that CrossOver is now compatible with DirectX 12.

While we are elated with this breakthrough, we acknowledge that our journey has just begun. Our team’s investigations concluded that there was no single magic key that unlocked DirectX 12 support on macOS. To get just Diablo II Resurrected running, we had to fix a multitude of bugs involving MoltenVK and SPIRV-Cross. We anticipate that this will be the case for other DirectX 12 games: we will need to add support on a per-title basis, and each game will likely involve multiple bugs.

Here is Andrew Tsai's video covering the announcement:



It's clearly early days for this endeavor, but the notion that they were tilting at windmills has been dashed.
 
Tangentially related to Asahi Linux, here's another gaming development, which coincidentally happened right before WWDC. CodeWeavers has announced that CrossOver is now compatible with DirectX 12.



Here is Andrew Tsai's video covering the announcement:



It's clearly early days for this endeavor, but the notion that they were tilting at windmills has been dashed.

Interesting, I wonder what the nature of these bugs were. Like they say it’s with MoltenVK, which is definitely likely. but I also wonder how much is the games themselves. I remember a couple of Nvidia engineers talking about why their drivers weren’t open source and basically admitting that a huge percentage of driver code, specifically when big new AAA games just launched and the driver “improved game specific performance”, is actually just getting the games to work at all because all these major game titles ship fundamentally breaking the graphics API spec in some way in order to push performance. I’m wondering if some of these game specific bugs in MoltenVK might be due to these kinds of shenanigans rather than actually a problem with MoltenVK. That said, translating graphics code from DirectX 12 to Vulkan to Metal is not easy so I have no doubt there are many bugs to be found there too.
 

Psst - if you want accessible endpoint ML, we're getting pretty close to releasing compute support on our Asahi GPU drivers thanks to @lina and @alyssa's work, and @eiln's work-in-progress Apple Neural Engine driver is already running popular ML models [github.com] on Asahi Linux.

Exciting!
 
So in addition to likely being wrong about the Mx Extreme and its (non-)development, I may have also been spreading wrong information about the nature of the problem with d/eGPUs and Apple Silicon. What I’ve been saying for awhile is that Apple likely software locks e/dGPUs and macOS could theoretically handle them okay on AS. Linux however requires normal mapping of memory across the PCIe Bar which AS doesn’t support - it only supports device mapping. Thus d/eGPUs even under Asahi Linux would only work if Apple added support for normal mapping to its PCIe controller.

However, after talking with Hector on Mastodon, the above may not be true! His contention is that normal mapping is in fact required for most games to work on GPUs regardless of operating system. Longhorn whose blog I originally gleaned the original information from disagrees saying there’s nothing in Vulkan/OpenGL that requires normal mapping, device mapping should be fine. Hector’s phrasing however makes it seem like almost games use normal mapped memory as a performance hack so I’m not sure if there’s a way to combine both statements to make sense. I don’t know if @leman or @Andropov a comment here. This wasn’t really resolved but I’ll link to the relevant posts so people can make up their own minds.



Interestingly from my perspective both agree that in principal CUDA should work regardless though Hector has no particular interest in testing this out - too much effort on yet another proprietary firmware stack if there’s a problem that needs fixing/support. I’m not sure if I would want to test it out myself when I upgrade to AS Mac. I doubt I’d have the ability to fix something if it goes wrong. But we’ll see. Especially if I can do it over thunderbolt rather than having to buy a Mac Pro.
 
Amazing how talented these people are. Now get ROCm running plz.
Ha! That would be cool, but I doubt it for the near future. Even AMD can’t seem to fully support their own product stack for their own open source API - though here I have absolutely no doubt Asahi could do better! But, as far as I know, the Asahi folks aren’t working on compute beyond what’s in the OpenGL/Vulkan graphics APIs.
 
Last edited:
Yeah, I'm aware and agree.
Sad thing is: even if they had the capacity to work on GPU compute the question remains: which API/framework should be implemented? OpenCL is dead. CUDA/ROCm are prorietary APIs, not to mention not a standard. So is it SYCL? Even Metal? Something else?
Tells a lot about what clusterfuck GPU compute is in general right now
 
Yeah, I'm aware and agree.
Sad thing is: even if they had the capacity to work on GPU compute the question remains: which API/framework should be implemented? OpenCL is dead. CUDA/ROCm are prorietary APIs, not to mention not a standard. So is it SYCL? Even Metal? Something else?
Tells a lot about what clusterfuck GPU compute is in general right now
I think ROCm is open source? CUDA/Metal are definitely out and one API I don’t know about. They’ll probably get to OpenCL if someone on the project hasn’t started already, after all they did OpenGL. Plus there’s a certain symmetry to porting Apple’s own original compute API back to Apple. :)

But I do believe ROCm/HIP is officially open source though according to Wikipedia AMD’s own firmware to support it is closed. So I’m not quite sure how that works … I guess no different from AMD proprietary drivers for Vulkan/OpenGL? So still possible?
 
Good question...
Regarding ROCm: I guess you are right, its open source. Yet it seems no-one is using it, and several people I know (the net seems to agree) claim its a bit of a fuzz (to put it mildly). Boilerplate nightmare. Difficult to work with. Or so I hear - quite possible its better these days, last time I checked is a bit.
 
Last edited:
Good question...
Regarding ROCm: I guess you are right, its open source. Yet it seems no-one is using it, and several people I know (the net seems to agree) its a bit of a fuzz (to put it mildly). Boilerplate a nightmare. Difficult to work with. Or so I hear - quite possible its better these days, last time I checked is a bit.
Yeah I don’t know much about ROCm beyond that part of it, HIP, is supposed to allow for the (easy) porting of CUDA code to AMD. I don’t know how true the (easy) part is as I haven’t looked into it myself beyond reading high level overview summaries.
 
Something I think is kind of interesting is how Asahi Linux works around the 16K vs 4K page size when emulating x86 apps, in particular games. They actually use a microVM (in addition to Wine/Proton) to run the software with near bare metal performance:



I can't remember how this is solved on macOS for Wine. Presumably the same issue of mismatched page size expectations exists but maybe macOS is more flexible than Linux? Apparently Linux the kernel page size is hardcoded and, until Asahi, a lot of software expectations of page size were as well. I'm pretty sure the Asahi team, probably Hector, have said how this is dealt with on macOS, but I've forgotten. Anyone know?
 
Last edited:
Back
Top