macOS 26.2 adds Infiniband over Thunderbolt support

Which brings me to my point: Siri is being set up to fly for users. That Apple can basically pay whoever it thinks can produce a great model for them to their specific request, and then load it on servers that sip a fraction of the energy (500 Watts for this 4 Mac supercomputer vs 2.4 kilowatts for 4 H100s), means not only do they not need to resort to funding nuclear energy companies, but they can run Siri for low cost, which means they can continue to offer Siri for free.

That Apple can offer a personalized pocket assistant for free, and its promised feature set is being able to ask it to perform actions and get personal information of you and your friends/family you want to remember is a significant, actual competitive advantage.

Too bad that doesn't play into people's made up narrative of being behind in "AI."

I don't know, but I would be surprised if Apple didn't request (if they chose to go third party) that the model be encoded with ATSC in mind, which I believe is an industry first for widely deployed transformer models to consumers. Encoding weights with ATSC means they can shrink the bits down, and it's only possible because Apple's GPUs were designed with ATSC in mind. So while I don't know for sure if they will, Apple has the ability to get a far larger, more capable model into memory without needing more actual memory. This is significant given the nearing memory price increases for the industry, and for energy costs. It means Apple will be able to provide Siri to all users with Apple Intelligence enabled devices for free.

Apple's PCC model already does this with ATSC, and from recent reports on social media, PCC model seems to be upgraded significantly with 26.2, producing far faster, more accurate responses.

Apple is behind in AI my ass lol.
 
Another interesting highlight in a YouTube comment.


Based off of Apple's RDMA over Thunderbolt , Apple is able to achieve 28 tokens per second on a 1 trillion parameter. It drew an average of 500 watts during inference. This is 0.000021 dollars per second of electricity it uses. For $.15 per kWh avg

That's pretty incredible given you need to pay $21 per million tokens for input and $168 per million tokens for output on other models in the cloud,

Simplistically for a 2 million total token group of interactions (1 million each), you're spending $189.
At that same level, you're paying $1.43 in electricity costs for mac. If you factor in the set up, it's around $10 with 24/7 usage over 3 years of ownership.

Yes, it takes far longer (about 10 hours across 2 million token interaction... actually apparently not. 5.2 Pro runs at 25 tokens per second, which ironically makes it slower...) but the model is completely in your control, you're not locked into a model providers specific rate, and you get privacy and security.

5.2 Pro is ahead slightly in benchmarks, but not nearly enough for it to cost much more


Even against Opus which is $30 for the same scenario, it's not much better in benchmarks to justify the price.

I'm pretty sure with macOS Tahoe 26.2 Apple just annihilated literally everyone lol. If the run times of 25 tokens per second for 5.2 Pro and 42 tokens per second for Opus 4.5 are true, then Apple is literally providing a much better value with Macs vs cloud computing.

You can walk into an Apple Store and buy 4 Macs (or order them). You can't even buy NVIDIA consumer GPUs at this point let alone H200's as a consumer LMFAO. To boot, you can finance those 4 Macs on an Apple Card for 12 months

 
Last edited:
I just wish we could see more uses of this outside of language models. For example, can we use this for better rendering speeds? How much might it help?
I wouldn’t be surprised if RDMA finds its way into other use cases. Apple is usually quite determined to squeeze every last drop of goodness from engineering efforts like this.

I know Jeff Geerling mentioned SMB Direct in his video. I wonder if that kind of use would be beneficial for Apple. Afaik it’s useful in the server room more than workstations/consumer uses. Maybe Final Cut Pro?
 
The below data is sourced from OpenRouter and other various sources, like Apple.com, chooseenergy.com, and various YouTubers.

Apple's RDMA over Thunderbolt allows you to combine 4 Macs with M3 chips for a total combined memory capacity of 2 TB, and a memory speed of 3.2 TB/s.

The average input token ratio is 100:7.5 for input to output across Anthropic's Opus 4.5 and another company's 5.2 Pro model (that was released in response to Gemini 3 Pro).

This means 1,000,000:75,000 tokens, but since both are measured by millions, and we want to approximate longer term costs, I will scale it further to 10,000,000,000:750,000,000 tokens. Again that's input against output ratio.

5.2 Pro costs $21 per million input tokens and $168 per million output tokens.

Opus 4.5 costs $5 per million input tokens and $25 per million output tokens.

Using this long term use:

5.2 Pro costs 210,000 plus 126,000, totaling $336,000 for total usage

Opus 4.5 costs 50,000 plus 18,750, totaling $68,750 for total usage

To calculate the Mac's usage cost, we need to approximate electricity costs plus total costs of ownership.

We assume the Mac has an average prompt processing speed of 250 tokens per second across all models, and an average token generation of 28 tokens per second for 1 trillion parameter model.

The average cost of electricity is $0.18 per kWh

To calculate 10 billion tokens input plus 750 million for output:

10 billion divided by 250 tokens per second is 462 days of 24/7 calculation

750 million divided by 28 tokens per second is 310 days of 24/7 calculation.

The Mac draws an average of 500 watts during inference.

773 days of 24/7 calculation is 18,551 hours of usage.

18,551 hours at 0.5 kW is 9275 kWh used

Total electricity cost for this amount of processing is $1669

The Mac set up costs $40,000 roughly.

This means for this amount of work, the Mac costs $41,669

Comparison:

If you chose cloud computing, you'd pay anywhere between $68,750 to $336,000 for this processing

Apple offers anyone the ability to have their own local set up for around $42,000.

It gets even better, because MLX and EXO 1.0 allows you to change models on the fly and load multiple hundreds-of-billions parameter models at the same time. A common use case is using one model for "planning," another for "coding," another for "debugging," and another for "developer docs."

If you chose to split your entire load between 5.2 Pro and Opus 4.5, you'd have to pay over $200,000.
With Mac, no matter how many different models you choose, it's the same price: $42,000.

and multiple people report that even running this processing, the Macs stay quiet and cool.

All of this, and you get the benefit of future models being able to fit into 2 TB of memory at 3.2 TB per second, privacy, and security. It is fully in your control, and it is so tiny it fits on your desk.

Finally, Apple offers financing for 12 months, which means people can pay $3333 per month instead of $40,000. NVIDIA has so many supply shortages you couldn't even source enough 5090's to match, and even if you could, the cost would be exorbitant. NVIDIA doesn't offer H200's to consumers. You cannot match this set up with any other company for this performance, efficiency, and amazing and small design that stays cool and near silent.

Choosing between $200,000 worth of tokens that are spent and can't be used again, or $42,000 worth of hardware that can give you as many tokens as you want, and gives you FOUR Macs with an entire ecosystem and OS and apps to develop and use whatever you needed those tokens for...

The choice is clear.

This is a revolution. It is literally game changing. There is nothing else like this in the world, at all.

Apple is not behind in AI. It is literally leading it.
 
Last edited:
Are there a lot of customers for this? Are lots of businesses rolling their own LLMs? If so, can we make them knock that off?
Apple is all about democratizing access to technology, both in terms of actual access and ease of use. So it fulfills that in classic Apple fashion, so anyone can do it. It also means people will smaller set ups can also link together their various Macs they might have. So it's not like there's a literal market of X number of people, but it enables access to the common person what used to be restricted to server and enterprise companies, all in a pretty easy GUI. What's more Apple than that?

The biggest use case is probably Siri with Private Cloud Compute, connecting 4 chips or more on a server. Apple is rumored to be using 1+ trillion parameter model for Siri. So if Apple is able to achieve 28 tokens per second on a 1 trillion model currently with M3 and latency incurred by the Thunderbolt cables, then I think this bodes well for M5 because M5 increases prompt processing alone by 5X, let alone even more speed with better and more cores and higher bandwidth.

Apple also doesn't need to limit themselves to cost for their server. They could include the highest end LPDDR, or even include HBM (but that would increase energy usage, so I don't know).

Also Mike Rockwell apparently said he is assigning one of Apple's spatial OS team leaders in charge of performance to Siri speed and performance, because he has experience in Real time system, lag free systems

Given the reports of 26.2 Apple Intelligence being faster and more accurate, something seems to be improving but no one here has commented on whether their PCC model is faster and smarter on 26.2 yet.

And I guess my point was not so much about market need rather than addressing this false notion that Apple is behind in "AI." Even if we face "AI" head on, which as I explained in my holographic video calls post, is just transformer models, it is very clear that Apple isn't just not behind, but Apple is an AI leader; not for doing "something" esoteric that they're "ahead" in, but doing something that they're ahead with.

They just gave consumers access to what previously required $500,000 worth of hardware, with a much easier to use and better UX, far more attractive design, and energy efficiency that beats everything, all for 1/10th the price. I think Steve Jobs would've very proud and happy of this leap
 
Last edited:
I've read some hysteria that Apple's contracts for RAM is expiring. There is no evidence of that, but let's say it was. People are pretending that Apple's pricing advantage would evaporate.

Let's take a look:

It costs $5,500 to upgrade both the chip and the RAM to 512 GB. Let's say Apple did increase the price of this upgrade by 3X (the upper increase for consumers price ram). This means the previously $9,500 is now $20,000. At 4 Mac's, it's $80,000.

This is still only $12000 more than going with Anthropic's API usage (no guarantees about its own future prices, which will eventually increase because it doesn't earn any profit), and $250,000 cheaper than another company's 5.2 Pro offering. It is also over $400,000 cheaper than comparable server hardware.

They would need to increase their upgrade pricing from 5,500 to around 80,000 to match the cost of 5.2 pro, and even then, you get the privacy/security/design/model/OS/software advantages with that.
Apple isn't going to ever do $80,000 to upgrade to 512 GB lol.

No need to panic one way or another.
 
Last edited:
With RDMA over TB software stack done and tested, what is the possibility of Apple creating a PCIe card with Mx SoCs with built-in memory for the Mac Pro?
 
Along with Thunderbolt RDMA that came with macOS 26.2, there are also some other new and interesting libraries, such as libmlx5. Not to be confused with Apple's ML project MLX, mlx5 is some sort of RDMA over Ethernet thing, which AFAICT is specific to NVIDIA/Mellanox NICs: https://doc.dpdk.org/guides/nics/mlx5.html

For those who'd like to see for themselves, the (text/YAML) library stubs are in the macOS 26.2 SDK, at `/Library/Developer/CommandLineTools/SDKs/MacOSX26.2.sdk/usr/lib/librdma.tbd`.

I'm not a networking person, but I'd guess that they're using it in their datacenter machines, though I'm surprised that would make it into the main macOS distro. Who knows, maybe they're planning to bring high speed Ethernet RDMA to their Mac products.
 
Last edited:
I wouldn’t be surprised if RDMA finds its way into other use cases. Apple is usually quite determined to squeeze every last drop of goodness from engineering efforts like this.

I know Jeff Geerling mentioned SMB Direct in his video. I wonder if that kind of use would be beneficial for Apple. Afaik it’s useful in the server room more than workstations/consumer uses. Maybe Final Cut Pro?

RDMA is already used in server land (over ethernet) for high speed storage. Will be interesting to see the use cases they have for an army of apple silicon machines linked up like this. Mac Studios are almost the perfect lego brick hardware wise to use for a hyper-converged server stack.

Obviously if you were to do that internally, you wouldn’t have them in an enclosure like the retail version, you’d rack them in something more appropriate.

Almost sounds like we’re due for a new high speed interface on the mac studio specifically designed (or at least intended) for this. Like Infiniband, so you can hook them all up to an industry standard switch. Or at least a faster version of thunderbolt to handle higher speed infiniband.
 
Last edited:
Are there a lot of customers for this? Are lots of businesses rolling their own LLMs? If so, can we make them knock that off?
I have heard (industry contact) that Rio Tinto spent approximately 700M on AI development in the past 12-18 months. They’re a mining company.

They’re apparently all in on AI instead of user generated reporting.

At that sort of spend you’d be stupid to not be considering this stuff to be hosted locally.


Some of the stuff they’re doing



 
Back
Top