Metal 3.2 new features.

Jimmyjames

Site Champ
Joined
Jul 13, 2022
Posts
922
I saw this tweet concerning a couple of additions to Metal 3.2.

They quote this document: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf

They quote section 6.15.1
1721261024641.png


Can anyone more knowledgeable comment if these are significant additions?
 
Depends how you think of significance I guess. I don’t know which synchronization primitives metal had before this was added but it has the potential to make some algorithms viable that would be ludicrously inefficient before. It also depends on the memory ordering model of their gpus and future plans. On a sequentially consistent platform this would boil down to a no-op. Or it may compile to one or a few fence operations.

I’ve done very little gpu compute but I did a university project on relaxed memory models where I made a model checker that could enumerate all possible states of different programs based on a relaxed arm-like memory model. Based off of a paper proving the most efficient way of enumerating the states that we extended with update semantics like xchg, faa, etc.
would love to hear other opinions from more gpu programming experienced folk. In my opinion though it’s a value able addition but not that huge. As I understand it you could just chain together shader programs instead of doing in-shader Thread synchronisation to get syncing done albeit at a choke point for some threads.
 
This essentially allows shaders running on different GPU cores to synchronize their work, something that was not possible before. Obviously a big jump in capability, as to practical utility, I am unsure. Some state of the art algorithms I know require forward progress guarantee (that is, a guarantee that a shader program partition will continue running to completion once started) in addition to global synchronization. Having these two capabilities allows you to implement efficient work queues on GPU. As @casperes1996 says, this Metal capability can be already achieved by launching small kernels consecutively on previous Metal versions, it’s just that the performance would be very bad.

See also my question on the Apple dev forums, with commentary by someone from the GPU team: https://developer.apple.com/forums/thread/756663
 
Last edited:
Back
Top