Metal 3.2 new features.

Jimmyjames · Jul 17, 2024

I saw this tweet concerning a couple of additions to Metal 3.2.

https://Twitter or X not allowed/reeselevine/status/1810467175783899221

They quote this document: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf

They quote section 6.15.1

Can anyone more knowledgeable comment if these are significant additions?

casperes1996 · Jul 17, 2024

Depends how you think of significance I guess. I don’t know which synchronization primitives metal had before this was added but it has the potential to make some algorithms viable that would be ludicrously inefficient before. It also depends on the memory ordering model of their gpus and future plans. On a sequentially consistent platform this would boil down to a no-op. Or it may compile to one or a few fence operations.

I’ve done very little gpu compute but I did a university project on relaxed memory models where I made a model checker that could enumerate all possible states of different programs based on a relaxed arm-like memory model. Based off of a paper proving the most efficient way of enumerating the states that we extended with update semantics like xchg, faa, etc.
would love to hear other opinions from more gpu programming experienced folk. In my opinion though it’s a value able addition but not that huge. As I understand it you could just chain together shader programs instead of doing in-shader Thread synchronisation to get syncing done albeit at a choke point for some threads.

leman · Jul 17, 2024

This essentially allows shaders running on different GPU cores to synchronize their work, something that was not possible before. Obviously a big jump in capability, as to practical utility, I am unsure. Some state of the art algorithms I know require forward progress guarantee (that is, a guarantee that a shader program partition will continue running to completion once started) in addition to global synchronization. Having these two capabilities allows you to implement efficient work queues on GPU. As @casperes1996 says, this Metal capability can be already achieved by launching small kernels consecutively on previous Metal versions, it’s just that the performance would be very bad.

See also my question on the Apple dev forums, with commentary by someone from the GPU team: https://developer.apple.com/forums/thread/756663

Metal 3.2 new features.

Jimmyjames

Elite Member

casperes1996

Site Champ

leman

Site Champ

Similar threads