This is a nice comparison between M1 Pro, M3 Max and a desktop 16core AMD + RTX 4090 on LLama2 with 13billion and 70billion parameters.
TLDR; it highlights the huge advantages that Apple silicon has with it’s unified memory architecture for large locally run inference models compared to 4090’s etc… also highlights that M3 Max on smaller parameter (e.g. 7billion) is not far behind a 4090 at a fraction of the power consumption.
Not really anything new for the folks that I see regularly posting here who will know this stuff already - however still nice to see tangible real-world side by side benchmarks!
Enjoy.
TLDR; it highlights the huge advantages that Apple silicon has with it’s unified memory architecture for large locally run inference models compared to 4090’s etc… also highlights that M3 Max on smaller parameter (e.g. 7billion) is not far behind a 4090 at a fraction of the power consumption.
Not really anything new for the folks that I see regularly posting here who will know this stuff already - however still nice to see tangible real-world side by side benchmarks!
Enjoy.