Nvidia and Apple: Frenemies are putting grudges aside to supercharge AI performance
Apple and Nvidia have announced a surprise collaboration targeted at improving LLM inference performance, combining Apple's open-source ReDrafter with Nvidia's raw silicon grunt. The result? A 2.7x improvement in performance over auto regression.
www.notebookcheck.net
This is interesting, Apple actually teaming up with Nvidia to improve Apple’s LLM performance on Nvidia’s GPUs. As the article states, collaboration between the two has been very rare over the last 16 years.
(EDIT) Apple’s post:
Accelerating LLM Inference on NVIDIA GPUs with ReDrafter
Accelerating LLM inference is an important ML research problem, as auto-regressive token generation is computationally expensive and…
machinelearning.apple.com
Here’s Nvidia’s more technical blog post:
NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference | NVIDIA Technical Blog
Recurrent drafting (referred as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM.
developer.nvidia.com
Last edited: