3D games working on Apple GPU on Linux at 4K!

mr_roboto · May 21, 2023

dada_dave said:
In interest of balance on my earlier, negative posts about the unexpected costs of Linux development, here are a couple of posts from Hector about how Linux can be an improvement over macOS on the same hardware:

Hector Martin (@marcan@treehouse.systems)

Yet another person on Reddit surprised that Asahi Linux compiles stuff way faster than macOS. "But macOS is so optimized for the hardware!" they all say... except Linux is already way more optimized *in general* than macOS is, for many workloads! `$ time tar xf linux-6.3.3.tar` macOS on APFS...

social.treehouse.systems

Hector Martin (@marcan@treehouse.systems)

This is very often overlooked: one reason Linux can keep pushing the performance and design envelope much better than macOS or Windows is that there is no stable in-kernel API. That means the kernel is free to refactor and improve internal interfaces continuously, and quickly eliminate legacy...

social.treehouse.systems

In short: with many more people working on kernel improvements and no requirement for a stable kernel API, Linux can optimize faster and remove legacy cruft sooner. Thus Linux can be faster for the same tasks on the same hardware even though macOS is optimized for that hardware. Hector does caveat this that of course macOS still has more features enabled than they do currently, but they are catching up and, again, for certain tasks Linux will likely be faster/smoother.

I'd quibble a bit with Hector about why macOS is slower than Linux when you ask each to do tons of small file I/Os per second. (and btw, this is not at all a new thing with Apple Silicon, people used to do similar macOS vs Linux measurements on x86 Macs, with very similar results.) IMO, it's due to three factors:

1. Linus Torvalds cares. A lot! He uses his computers to do exactly the things Hector talked about: git SCM operations on Linux kernel source trees, unpacking kernel source tarballs, compiling kernels. He's a happy man when anyone sends him a patch improving Linux VFS performance, because he will see its benefits in a very direct way.

2. There's plenty of people feeding Torvalds such patches, as many of the Linux kernel's corporate patrons care too. Some might surprise you - Facebook, pre-Elon Twitter. Turns out that if you're a social media giant, hiring top kernel devs so you can encourage them to work on performance in key areas can save millions of dollars in operating costs. Small-file IO is important to many server workloads...

3. Most Mac customers who care about file IO performance only care about large streaming file IO (video editing and the like). Apple doesn't sell servers either. And most devs who work on Macs aren't interacting with SCM databases as large and complicated as the Linux git tree. So... Apple just doesn't put much effort into small-file random IO. Would they benefit from Linux-like performance here anyways? Possibly, but I think it's clear that management either doesn't understand that or has consciously chosen to rank it as such a low priority that it never gets done.

leman · May 21, 2023

mr_roboto said:
I'd quibble a bit with Hector about why macOS is slower than Linux when you ask each to do tons of small file I/Os per second. (and btw, this is not at all a new thing with Apple Silicon, people used to do similar macOS vs Linux measurements on x86 Macs, with very similar results.) IMO, it's due to three factors:

1. Linus Torvalds cares. A lot! He uses his computers to do exactly the things Hector talked about: git SCM operations on Linux kernel source trees, unpacking kernel source tarballs, compiling kernels. He's a happy man when anyone sends him a patch improving Linux VFS performance, because he will see its benefits in a very direct way.

2. There's plenty of people feeding Torvalds such patches, as many of the Linux kernel's corporate patrons care too. Some might surprise you - Facebook, pre-Elon Twitter. Turns out that if you're a social media giant, hiring top kernel devs so you can encourage them to work on performance in key areas can save millions of dollars in operating costs. Small-file IO is important to many server workloads...

3. Most Mac customers who care about file IO performance only care about large streaming file IO (video editing and the like). Apple doesn't sell servers either. And most devs who work on Macs aren't interacting with SCM databases as large and complicated as the Linux git tree. So... Apple just doesn't put much effort into small-file random IO. Would they benefit from Linux-like performance here anyways? Possibly, but I think it's clear that management either doesn't understand that or has consciously chosen to rank it as such a low priority that it never gets done.

There is another, much more prosaic reason. MacOS does extensive tracking and recording of filesystem events (in addition to other dozens of services). Bare-bones Linux, on which these tests are usually performed, has only minimal services configured. A few years ago I remember doing a little experiment, and using a filesystem event watcher on Linux brings the I/O performance to the same ballpark as APFS.

throAU · May 21, 2023

Other reason macOS is slower than Linux: it does a heap more in the background.

iCloud related stuff, like sync, continuity, airdrop, shared copy paste etc don’t come for free. Linux doesn’t do a lot of things macOS does in the background.

Edit: exactly as per @leman above.

Colstan · Jun 3, 2023

Tangentially related to Asahi Linux, here's another gaming development, which coincidentally happened right before WWDC. CodeWeavers has announced that CrossOver is now compatible with DirectX 12.

While we are elated with this breakthrough, we acknowledge that our journey has just begun. Our team’s investigations concluded that there was no single magic key that unlocked DirectX 12 support on macOS. To get just Diablo II Resurrected running, we had to fix a multitude of bugs involving MoltenVK and SPIRV-Cross. We anticipate that this will be the case for other DirectX 12 games: we will need to add support on a per-title basis, and each game will likely involve multiple bugs.

Here is Andrew Tsai's video covering the announcement:

It's clearly early days for this endeavor, but the notion that they were tilting at windmills has been dashed.

dada_dave · Jun 3, 2023

Colstan said:
Tangentially related to Asahi Linux, here's another gaming development, which coincidentally happened right before WWDC. CodeWeavers has announced that CrossOver is now compatible with DirectX 12.

Here is Andrew Tsai's video covering the announcement:

It's clearly early days for this endeavor, but the notion that they were tilting at windmills has been dashed.

Interesting, I wonder what the nature of these bugs were. Like they say it’s with MoltenVK, which is definitely likely. but I also wonder how much is the games themselves. I remember a couple of Nvidia engineers talking about why their drivers weren’t open source and basically admitting that a huge percentage of driver code, specifically when big new AAA games just launched and the driver “improved game specific performance”, is actually just getting the games to work at all because all these major game titles ship fundamentally breaking the graphics API spec in some way in order to push performance. I’m wondering if some of these game specific bugs in MoltenVK might be due to these kinds of shenanigans rather than actually a problem with MoltenVK. That said, translating graphics code from DirectX 12 to Vulkan to Metal is not easy so I have no doubt there are many bugs to be found there too.

dada_dave · Jun 5, 2023

Hector Martin (@marcan@treehouse.systems)

Lol, it took 2 weeks for geohot to go from grandiosely announcing he's going to make ML good on AMD hardware with his own code to giving up after running into some bugs. https://geohot.github.io/blog/jekyll/update/2023/05/24/the-tiny-corp-raised-5M.html...

social.treehouse.systems

Psst - if you want accessible endpoint ML, we're getting pretty close to releasing compute support on our Asahi GPU drivers thanks to @lina and @alyssa's work, and @eiln's work-in-progress Apple Neural Engine driver is already running popular ML models [github.com] on Asahi Linux.

Exciting!

dada_dave · Jun 6, 2023

OpenGL 3.1 working on Asahi Linux

OpenGL 3.1 on Asahi Linux

rosenzweig.io

Oh and as discussed here:

Mac - New Game Porting Toolkit is Wine

Introduced at the latest WWDC, it’s basically Apple Proton: Hopefully they support and develop it well!

techboards.net

Apple’s new game porting toolkit is a custom Wine, so basically Apple proton

dada_dave · Jun 7, 2023

OpenGL 3.3 almost here (though its the hardest parts, the geometry shaders, left):

Treehouse Mastodon

social.treehouse.systems

dada_dave · Jun 11, 2023

Adding OpenCL to Asahi Linux:

karolherbst 🐧 🦀 (@karolherbst@chaos.social)

Content warning: secret work

chaos.social

dada_dave · Jul 18, 2023

So in addition to likely being wrong about the Mx Extreme and its (non-)development, I may have also been spreading wrong information about the nature of the problem with d/eGPUs and Apple Silicon. What I’ve been saying for awhile is that Apple likely software locks e/dGPUs and macOS could theoretically handle them okay on AS. Linux however requires normal mapping of memory across the PCIe Bar which AS doesn’t support - it only supports device mapping. Thus d/eGPUs even under Asahi Linux would only work if Apple added support for normal mapping to its PCIe controller.

However, after talking with Hector on Mastodon, the above may not be true! His contention is that normal mapping is in fact required for most games to work on GPUs regardless of operating system. Longhorn whose blog I originally gleaned the original information from disagrees saying there’s nothing in Vulkan/OpenGL that requires normal mapping, device mapping should be fine. Hector’s phrasing however makes it seem like almost games use normal mapped memory as a performance hack so I’m not sure if there’s a way to combine both statements to make sense. I don’t know if @leman or @Andropov a comment here. This wasn’t really resolved but I’ll link to the relevant posts so people can make up their own minds.

Hector Martin (@marcan@treehouse.systems)

@dlawrie42@ecoevo.social @edinbruh@mstdn.social I think that blog is a bit confused. PCIe BARs are normally MMIO and therefore mapped as Device memory. We didn't change any of that for AS. Linux drivers will do ioremap() for typical PCIe BARs which maps to Device-nGnRE. This is what is expected...

social.treehouse.systems

Longhorn (@never_released@mastodon.social)

@marcan@treehouse.systems @dlawrie42@ecoevo.social @mischa@oc.is > there's probably some page size brokenness, surprisingly, didn't see issues there. And with mapping as dev mem, things work fine as far as I can see even for OGL/VK

mastodon.social

Interestingly from my perspective both agree that in principal CUDA should work regardless though Hector has no particular interest in testing this out - too much effort on yet another proprietary firmware stack if there’s a problem that needs fixing/support. I’m not sure if I would want to test it out myself when I upgrade to AS Mac. I doubt I’d have the ability to fix something if it goes wrong. But we’ll see. Especially if I can do it over thunderbolt rather than having to buy a Mac Pro.

dada_dave · Aug 3, 2023

Asahi moving to Fedora as its flagship distro with an official release scheduled for the end of August:

Our new flagship distro: Fedora Asahi Remix - Asahi Linux

asahilinux.org

dada_dave · Aug 22, 2023

Asahi full OpenGL 3.1 conformant (yes @Jimmyjames Alyssa takes pot shots at Apple for not being fully conformant

)

The first conformant M1 GPU driver

rosenzweig.io

Jimmyjames · Aug 23, 2023

dada_dave said:
Asahi full OpenGL 3.1 conformant (yes @Jimmyjames Alyssa takes pot shots at Apple for not being fully conformant )

The first conformant M1 GPU driver

rosenzweig.io

Lol I’m devastated Apple doesn’t invest more time in an OpenGL driver.

Agent47 · Aug 23, 2023

Amazing how talented these people are. Now get ROCm running plz.

dada_dave · Aug 23, 2023

Agent47 said:
Amazing how talented these people are. Now get ROCm running plz.

Ha! That would be cool, but I doubt it for the near future. Even AMD can’t seem to fully support their own product stack for their own open source API - though here I have absolutely no doubt Asahi could do better! But, as far as I know, the Asahi folks aren’t working on compute beyond what’s in the OpenGL/Vulkan graphics APIs.

Agent47 · Aug 23, 2023

Yeah, I'm aware and agree.
Sad thing is: even if they had the capacity to work on GPU compute the question remains: which API/framework should be implemented? OpenCL is dead. CUDA/ROCm are prorietary APIs, not to mention not a standard. So is it SYCL? Even Metal? Something else?
Tells a lot about what clusterfuck GPU compute is in general right now

dada_dave · Aug 23, 2023

Agent47 said:
Yeah, I'm aware and agree.
Sad thing is: even if they had the capacity to work on GPU compute the question remains: which API/framework should be implemented? OpenCL is dead. CUDA/ROCm are prorietary APIs, not to mention not a standard. So is it SYCL? Even Metal? Something else?
Tells a lot about what clusterfuck GPU compute is in general right now

I think ROCm is open source? CUDA/Metal are definitely out and one API I don’t know about. They’ll probably get to OpenCL if someone on the project hasn’t started already, after all they did OpenGL. Plus there’s a certain symmetry to porting Apple’s own original compute API back to Apple.

But I do believe ROCm/HIP is officially open source though according to Wikipedia AMD’s own firmware to support it is closed. So I’m not quite sure how that works … I guess no different from AMD proprietary drivers for Vulkan/OpenGL? So still possible?

Agent47 · Aug 23, 2023

Good question...
Regarding ROCm: I guess you are right, its open source. Yet it seems no-one is using it, and several people I know (the net seems to agree) claim its a bit of a fuzz (to put it mildly). Boilerplate nightmare. Difficult to work with. Or so I hear - quite possible its better these days, last time I checked is a bit.

dada_dave · Aug 23, 2023

Agent47 said:
Good question...
Regarding ROCm: I guess you are right, its open source. Yet it seems no-one is using it, and several people I know (the net seems to agree) its a bit of a fuzz (to put it mildly). Boilerplate a nightmare. Difficult to work with. Or so I hear - quite possible its better these days, last time I checked is a bit.

Yeah I don’t know much about ROCm beyond that part of it, HIP, is supposed to allow for the (easy) porting of CUDA code to AMD. I don’t know how true the (easy) part is as I haven’t looked into it myself beyond reading high level overview summaries.

dada_dave · May 28, 2024

Something I think is kind of interesting is how Asahi Linux works around the 16K vs 4K page size when emulating x86 apps, in particular games. They actually use a microVM (in addition to Wine/Proton) to run the software with near bare metal performance:

Using microVMs for Gaming on Fedora Asahi

It’s been almost a year since I transitioned from the Virtualization to the Automotive team at Red Hat with the goal of ensuring RHIVOS ships with a powerful Virtualization stack. While there’s a large overlap between a Virtualization stack for Servers and the one for Automotive platforms, the...

sinrega.org

ecoevo.social

I can't remember how this is solved on macOS for Wine. Presumably the same issue of mismatched page size expectations exists but maybe macOS is more flexible than Linux? Apparently Linux the kernel page size is hardcoded and, until Asahi, a lot of software expectations of page size were as well. I'm pretty sure the Asahi team, probably Hector, have said how this is dealt with on macOS, but I've forgotten. Anyone know?

3D games working on Apple GPU on Linux at 4K!

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Power User

Elite Member

Power User

Elite Member

Power User

Elite Member

Elite Member

Similar threads