New ML framework from Apple called MLX

Another interesting feature request for complex64 metal gemm...


A different MPS backend isn't a huge tradeoff (and probably preferable as it will be more stable today in general - I wouldn't really look to MLX for anything mission critical important right now - more something I'd like to tinker with). The performance differential between MLX and MPS isn't as huge on M3 owing to the new chip improvements as it would be with other M1/M2 silicon that if you're using MLS anyway, you're not really missing out.
 
I wouldn't really look to MLX for anything mission critical important right now

Emphasis added.

I can’t imagine Apple releasing a framework like MLX for M1 and M2 improvements alone. I know it seems that way right now, but that’s where the puck has been. No, they have bigger plans. Obviously, MLX is in its infancy. We’ve yet to see the M3 Ultra and the M4 will undoubtedly debut in 2024. I expect MLX to have some major advances in the near future to compliment the Silicon releases. Remember… “Where the puck will be. Not where it’s been”. Just a hunch.
 
Someone has compiled a list of benchmark scores for MLX vs CPU vs MPS.
https://github.com/TristanBilot/mlx-benchmark

M1 Pro

Operationmlx_gpumlx_cpumpscpumps/mlx_gpu speedupmlx_cpu/mlx_gpu speedup
MatMul15.4636.6923.11593.400.491.37
Softmax3.9942.106.7632.290.699.55
Linear13.3135.3132.7698.051.461.65
Conv2d63.722505.3910.21125.39-0.8438.32
BCE4.2928.798.508.490.985.71
Concat6.3190.506.4343.790.0213.35

Some nice improvements except from Conv2d and Concat.
 
MLX seems to iterate quite rapidly. Some nice GEMM improvements in the latest release.
Some benchmarks here: https://github.com/ml-explore/mlx/pull/424#issuecomment-1898815724
1705618322825.png

1705618375557.png
 
Another interesting feature request for complex64 metal gemm...


A different MPS backend isn't a huge tradeoff (and probably preferable as it will be more stable today in general - I wouldn't really look to MLX for anything mission critical important right now - more something I'd like to tinker with). The performance differential between MLX and MPS isn't as huge on M3 owing to the new chip improvements as it would be with other M1/M2 silicon that if you're using MLS anyway, you're not really missing out.
It’s alpha (breaking changes will occur for the good of humanity), but entirely useable and quite performant. There are also still some performance cliffs, so I’d expect perf to improve significantly.

Also, they’re looking to hire kernel engineers 🙂


(Awni is the project lead, with three other Apple engineers and quite a few active contributors; they’re one of the most pleasant group of open source developers I’ve come across)
 
Last edited:
It’s alpha (breaking changes will occur for the good of humanity), but entirely useable and quite performant. There are also still some performance cliffs, so I’d expect perf to improve significantly.

Also, they’re looking to hire kernel engineers 🙂


(Awni is the project lead, with three other Apple engineers and quite a few active contributors; they’re one of the most pleasant group of open source developers I’ve come across)
Agreed - they are awesome. Super friendly and no attitude. That’s a huge contributing factor to making this a success.
 
Cuda backend for MLX sponsored by Apple. It seems this allows code developed on Apple Silicon to be deployed on Cuda devices.


Project leader posted this link titled “Cuda in MLX” on Twitter then deleted it.
 
Cuda backend for MLX sponsored by Apple. It seems this allows code developed on Apple Silicon to be deployed on Cuda devices.


Project leader posted this link titled “Cuda in MLX” on Twitter then deleted it.

They’ve been working on it for a few months and it’s getting close. The idea is that you write your model code, train in the cloud on nvidia hardware (or ASi if that becomes a thing), and deploy on ASi or nvidia/cloud depending on the application.

MLX is a pleasure to use compared to other ML frameworks, so this should really open the floodgates for adoption. It should also light a fire under Modular Mojo since it’s similar to what they set out to do, and MLX is far more openly developed. I have the utmost respect for Chris Lattner, but I think they went about developing Mojo in a rather weird closed/hype-promise way.
 
They’ve been working on it for a few months and it’s getting close. The idea is that you write your model code, train in the cloud on nvidia hardware (or ASi if that becomes a thing), and deploy on ASi or nvidia/cloud depending on the application.

MLX is a pleasure to use compared to other ML frameworks, so this should really open the floodgates for adoption. It should also light a fire under Modular Mojo since it’s similar to what they set out to do, and MLX is far more openly developed. I have the utmost respect for Chris Lattner, but I think they went about developing Mojo in a rather weird closed/hype-promise way.
Thanks. I assume this is not a “run Cuda code on macOS” thing?

Also, how is Mojo doing? I see Chris post a lot on Twitter but I have no idea if Mojo is successful.
 
Last edited:
Thanks. I assume this is not a “run Cuda code on macOS” thing?

Also, how is Mojo doing? I see Chris post a lot on Twitter but I have no idea if Mojo is successful.

Correct, you can target linux for Cuda (and CPU). Edit: I thought that Cuda on Windows was supported as well, but that doesn't seem to be the case from a cursory look at the build scripts.

I'm not really sure what's up with Mojo. I looked back into it a couple weeks ago and parts had been recently opened up (IIRC compute kernels), and other parts were closed/API-only. Somewhere in their FAQ it said they're planning to go fully open source in 2026 or something like that. No idea how the project is progressing in general, though.
 
Last edited:
Awni Hannun, one of, if not the main person who started MLX has left Apple. Obviously one person doesn’t make an entire project, but it is a shame and feels a little concerning.

1772230135565.png
 
Awni Hannun, one of, if not the main person who started MLX has left Apple. Obviously one person doesn’t make an entire project, but it is a shame and feels a little concerning.
I like him, it's sad to see him go, I like what he helped build, and I was surprised; but I disagree that it's concerning, and he would agree. You also need to view it in context, and no I don't mean in context of Gurman's 24/7 bullshit. I'm not going to elaborate.

It also needs to be said that before this he introduced Ronan Collobert publicly as another co-creator of MLX. You can look his resume up.

MLX has developed from a nascent and prescient project (it started before ChatGPT released, according to Awni Hannun) into a behemoth (technologically) that is literally going up against $500K servers. It has a rich community around it. This isn't like it's trying to get off the ground. The people working on the project are co-creators and also highly intelligent people, supported by a huge community.

People here keep doubting, for whatever reason, that anyone cares about local inference on Mac. Well, there's a massive community for it. Aside from NVIDIA and CUDA, Apple and Apple silicon and MLX stands alone and right up there. They've done amazing work, and it's only just the start
 
Last edited:
Awni Hannun, one of, if not the main person who started MLX has left Apple. Obviously one person doesn’t make an entire project, but it is a shame and feels a little concerning.

View attachment 38211
theres no reason to be concerned. Concerned is what you feel when Apple has only person working on the MLX project which is not the case or it hasn't been updated in 2 months etc. There are multiple people still working on MLX and the community will only get bigger with the release of M5 Pro and M5 Max.
 
Back
Top