New ML framework from Apple called MLX

Another interesting feature request for complex64 metal gemm...


A different MPS backend isn't a huge tradeoff (and probably preferable as it will be more stable today in general - I wouldn't really look to MLX for anything mission critical important right now - more something I'd like to tinker with). The performance differential between MLX and MPS isn't as huge on M3 owing to the new chip improvements as it would be with other M1/M2 silicon that if you're using MLS anyway, you're not really missing out.
 
I wouldn't really look to MLX for anything mission critical important right now

Emphasis added.

I can’t imagine Apple releasing a framework like MLX for M1 and M2 improvements alone. I know it seems that way right now, but that’s where the puck has been. No, they have bigger plans. Obviously, MLX is in its infancy. We’ve yet to see the M3 Ultra and the M4 will undoubtedly debut in 2024. I expect MLX to have some major advances in the near future to compliment the Silicon releases. Remember… “Where the puck will be. Not where it’s been”. Just a hunch.
 
Someone has compiled a list of benchmark scores for MLX vs CPU vs MPS.
https://github.com/TristanBilot/mlx-benchmark

M1 Pro

Operationmlx_gpumlx_cpumpscpumps/mlx_gpu speedupmlx_cpu/mlx_gpu speedup
MatMul15.4636.6923.11593.400.491.37
Softmax3.9942.106.7632.290.699.55
Linear13.3135.3132.7698.051.461.65
Conv2d63.722505.3910.21125.39-0.8438.32
BCE4.2928.798.508.490.985.71
Concat6.3190.506.4343.790.0213.35

Some nice improvements except from Conv2d and Concat.
 
MLX seems to iterate quite rapidly. Some nice GEMM improvements in the latest release.
Some benchmarks here: https://github.com/ml-explore/mlx/pull/424#issuecomment-1898815724
1705618322825.png

1705618375557.png
 
Another interesting feature request for complex64 metal gemm...


A different MPS backend isn't a huge tradeoff (and probably preferable as it will be more stable today in general - I wouldn't really look to MLX for anything mission critical important right now - more something I'd like to tinker with). The performance differential between MLX and MPS isn't as huge on M3 owing to the new chip improvements as it would be with other M1/M2 silicon that if you're using MLS anyway, you're not really missing out.
It’s alpha (breaking changes will occur for the good of humanity), but entirely useable and quite performant. There are also still some performance cliffs, so I’d expect perf to improve significantly.

Also, they’re looking to hire kernel engineers 🙂

https://Twitter or X not allowed/awnihannun/status/1749924118248743360?s=46&t=ZX5n4fpf03Mlx6yhaEC8ZA

(Awni is the project lead, with three other Apple engineers and quite a few active contributors; they’re one of the most pleasant group of open source developers I’ve come across)
 
Last edited:
It’s alpha (breaking changes will occur for the good of humanity), but entirely useable and quite performant. There are also still some performance cliffs, so I’d expect perf to improve significantly.

Also, they’re looking to hire kernel engineers 🙂

https://Twitter or X not allowed/awnihannun/status/1749924118248743360?s=46&t=ZX5n4fpf03Mlx6yhaEC8ZA

(Awni is the project lead, with three other Apple engineers and quite a few active contributors; they’re one of the most pleasant group of open source developers I’ve come across)
Agreed - they are awesome. Super friendly and no attitude. That’s a huge contributing factor to making this a success.
 
Cuda backend for MLX sponsored by Apple. It seems this allows code developed on Apple Silicon to be deployed on Cuda devices.


Project leader posted this link titled “Cuda in MLX” on Twitter then deleted it.
 
Cuda backend for MLX sponsored by Apple. It seems this allows code developed on Apple Silicon to be deployed on Cuda devices.


Project leader posted this link titled “Cuda in MLX” on Twitter then deleted it.

They’ve been working on it for a few months and it’s getting close. The idea is that you write your model code, train in the cloud on nvidia hardware (or ASi if that becomes a thing), and deploy on ASi or nvidia/cloud depending on the application.

MLX is a pleasure to use compared to other ML frameworks, so this should really open the floodgates for adoption. It should also light a fire under Modular Mojo since it’s similar to what they set out to do, and MLX is far more openly developed. I have the utmost respect for Chris Lattner, but I think they went about developing Mojo in a rather weird closed/hype-promise way.
 
They’ve been working on it for a few months and it’s getting close. The idea is that you write your model code, train in the cloud on nvidia hardware (or ASi if that becomes a thing), and deploy on ASi or nvidia/cloud depending on the application.

MLX is a pleasure to use compared to other ML frameworks, so this should really open the floodgates for adoption. It should also light a fire under Modular Mojo since it’s similar to what they set out to do, and MLX is far more openly developed. I have the utmost respect for Chris Lattner, but I think they went about developing Mojo in a rather weird closed/hype-promise way.
Thanks. I assume this is not a “run Cuda code on macOS” thing?

Also, how is Mojo doing? I see Chris post a lot on Twitter but I have no idea if Mojo is successful.
 
Last edited:
Thanks. I assume this is not a “run Cuda code on macOS” thing?

Also, how is Mojo doing? I see Chris post a lot on Twitter but I have no idea if Mojo is successful.

Correct, you can target linux for Cuda (and CPU). Edit: I thought that Cuda on Windows was supported as well, but that doesn't seem to be the case from a cursory look at the build scripts.

I'm not really sure what's up with Mojo. I looked back into it a couple weeks ago and parts had been recently opened up (IIRC compute kernels), and other parts were closed/API-only. Somewhere in their FAQ it said they're planning to go fully open source in 2026 or something like that. No idea how the project is progressing in general, though.
 
Last edited:
Back
Top