As I explained in my comment, if you want to provide evidence for said statements regarding my sentence “if you can find me a windows laptop that runs full speed on the battery doing this stuff with models larger than 24 GB,” then I’d be glad to delete my account on here. My point being that I’ve seen mobile Nvidia GPUs become crap when unplugged on a laptop, and I’m not saying they will be useless, but i dont buy that GPU will offer performance on the level of 100 seconds vs 8 seconds like that example.
You know, discussions like these is why I mostly keep of MacRumours these days. I really don't want these boards to adopt a similar culture.
In my opinion, it is very important to understand the purpose behind the questions. For instance, why are you posing this specific question and not a differently phrased one? What is that that you care about? The fact that the laptop does not throttle on battery, or the fact that a laptop is useful on battery? It is always possible to manipulate the question so that only one answer is possible. Are we achieving anything constructive with it? Hardly...
And don’t take this personally, but I’ve watched you comment on Apple silicon for three years and I’ve noticed a sharp turn in your commentary on them. of course you're allowed to change your mind, but there have been a couple times where I don’t think fair points were made when comparing stuff.
Just because I speak out agains maximalism doesn't mean that there is a "sharp turn in my commentary". Apple did some impressive work on the GPU front with M3, and we can confidently claim that they overtook Nvidia in some key areas. But this doesn't change the fact that their GPU lack raw GEMM power and that the memory bandwidth could be better.
It kind of seems like you think I’m saying Nvidia is bad or that Apple’s is better. I’m not saying either. I’m talking from a practical standpoint, real world usability. I know for a fact that once you work with stuff larger than 24 GB Nvidia’s stuff comes to a crawl. I’ve seen it happen. Apple’s architecture lets you do stuff that other GPUs can’t with large amounts of memory. Whether it be ML stuff or working with assets larger than 24 GB, Apple’s unified memory lets the GPU work with an insane amount of memory.
Absolutely, you won't get any argument from me here. Large amount of RAM with uniform performance characteristics is undoubtedly a unique advantage of Apple Silicon and will be an important asset going forward. And I also fully agree that the platform is a good foundation for future improvements. They just need more bandwidth and more GEMM compute.
This is a type of video that makes me throw my hands up in frustration. This is not information, it's content. One has to sit through 10 min of a person talking to get something that could be summarised in a few sentences. And having skimmed though the video, I still don't know what these results mean. There is a standard way to report performance of LLMs: tokens/second. Yet here he uses some arbitrary time measure on some arbitrary task. This is simply not helpful. There is no analysis. Is the M3 Max limited by the bandwidth? Is it limited by compute? Why not load the software into the Xcode instruments and look at the GPU profiler stats?
The only thing we learn from the video is that the Max with 128GB of accessible RAM has less of a performance penalty compared to a GPU with a smaller RAM pool (duh!). We still have no idea what this means in practical terms though. Is the performance of the 70b model sufficient to do relevant work? How does this compare to CPU inference speed? Would it be cheaper/more productive to build a workstation desktop or to rent a cloud computer? I can imagine a bunch of real world scenarios where an MBP is the best tool for the job. I can imagine even more scenarios where it is not.
These things are not as simple as getting the larger number. One needs to look at them in the context an actual problem. Otherwise we are lost in pointless microbenchmarking and e-peen measuring.
There is still aways to go, but Apple is doing remarkable things for portability and notebooks and what you can do with them from the tests and benchmarks that people have been posting on this thread. I am excited about what’s here today, and so, even more excited about what's next.
I think this is something we can all agree on.
Also, I’d like to say I am sorry to hear you had SARS-CoV-2 recently. I genuinely hope you are doing okay, and do not push yourself in the weeks ahead, physically or mentally, for sake of long term health. I would urge you take off work and stay off here/doing benchmarks, just stay in bed and actively rest entirely even if you stay awake. And If you don’t feel well, don’t take no at the doctor’s for an answer. You deserve to be and feel healthy. I am wishing you well wishes for everything
Thanks, this is very kind of you! It wasn't all to bad to be honest, just taking a while to get back to 100%. Ghastly weather isn't helping either. Can't wait to get back to the gym.
I’d be really interested in any examples or benchmarks you have regarding this.
I don't, because quality information is very hard to find. But we know GEMM performance on Apple GPUs and Nvidia GPUs and it's not even close (for now at least). To make it clear, I think a lot of the Tensor Core rhetorics is mindless flag waving, since the GPU is limited by the memory bandwidth for a large class of problems. Which is why model quantization and intelligent data selection are going to become increasingly important for practical ML as we go forward. I don't think that Apple needs to chase Nvidia's Tensor Core performance levels. But support for lower-precision data types, quantization, and sparsity will be very important. I'm sure they are cooking something like that.
Tangentially related, let’s not forget probably the ultimate example of what it means by: “skate to where the puck is going to be, not where it has been.”
Apple being Apple, they will just go about doing what they have planned, and maybe pivot a little occassionally if the original plan needs tweaking.
Precisely. What makes Apple so formidable is their ability to plan and execute. Looking back at the timeline of their advances, it becomes clear that some things are planned many years ahead. And it makes it easier to see what is likely to come in the future. I believe I spent enough time looking at Apple GPUs to have an idea what's coming. There is a logical path there. Or maybe I am just seeing ghosts, also possible