The recent macOS beta allows multiple Macs to be combined into a unified compute cluster using Thunderbolt. This feature allows direct access to RAM over Thunderbolt (RDMA — remote direct memory access) with low latency. This is exposed to the software using standard Infiniband APIs — a connectivity interface used in supercomputing. While I was not able to find any official mention or documentation of this feature, there are new patches to MLX (Apple's ML framework) enabling clustered compute over Thunderbolt. And there are users on Twitter who report real-world latencies between 5 and 9 microseconds — not too bad for a cable connection given the fact that Apple's RAM has latency of ~ 0.1 microseconds. All Macs with Thunderbolt 5 appear to support this (which currently means M4 Pro/Max and M3 Ultra).
github.com
Thunderbolt RDMA communications backend by angeloskath · Pull Request #2808 · ml-explore/mlx
Starting the PR. Will update with more info as docs and launcher are written and this goes out of draft.
Last edited: