Reply to thread

Thanks. I have been wondering what Apple could do to improve this situation.


We all know that the AMX units within the cpu have good matmul performance. Or at least that’s what I understand. Obviously that doesn’t help gpu performance and indeed, their access is limited to the Accelerate framework in addition to being too few to compete with the Tensor Cores.


Do Apple’s gpus have anything dedicated to GEMM or matmul (not sure if there is a difference)? I think I heard something about a new simd instruction on recent gpus that can help with this. Is that correct? Also, do you think the potential for dual-issue fp32/16/int within the ALUs in future gpus could help the situation?


I recently saw this paper concerning Apple Silicon in scientific computing. Not a perfect match for ML, but I would have thought there is a significant overlap. They seem quite positive about the state of scientific computing on ASi. It would be great if you or anyone knowledgable on here would be able to say if the points raised were accurate.

[URL unfurl="true"]https://arxiv.org/pdf/2211.00720.pdf[/URL]


Number of states in our country minus the number of Supreme Court Justices?
Back
Top