Apple may use Google’s servers for AI

This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.
 
This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.

Contrast with other rumors today that Apple is only using 10% of its AI server capacity.
 
This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.
How, after reading my well-sourced multiple posts on this matter, do you find it "not outlandish?"

Sorry for being so direct, but I have put in considerable effort into dispelling this entire notion, and while that doesn't mean I'm right definitively, this news should be met with utmost skepticism.

It is the definition of outlandish.

First, and it must be stated again, Tim Cook directly said on an earnings call that Siri and Apple Intelligence will continue to use PCC servers. This was in response to questions about Google's partnership, specifically with their cloud tech.

Second, how can it be both underutilized and underpowered? Something smells there. I expand on this more below in the third point!

Third, it's not clear what exactly the report is precisely saying, because I've read multiple websites and they're saying different versions. Let's assume for the moment The Verge is accurate in its paraphrase of the article:

Private Cloud Compute isn't Apple Silicon with a privacy policy promise. It's a verifiable, hardware enforced, precisely engineered system that uses Apple silicon to verify from the Secure Enclave up that the PCC server is authentic, is running the same software that Apple publicly gives to researchers, and has no tampering. Actually there are a *bunch* of claims they make privacy wise and they need to verify all of them. It only works because it's all 1 solution: Apple silicon, software, frameworks, every level is built for THIS specific thing

I find it utterly perplexing and frustrating that despite the very cool nature of PCC, and the very intelligent and capable users on this site, that no one has bothered to even look at it and analyze it. There is so much to analyze and learn about. When else does Apple literally give you all of the resources they do to learn about PCC? Imaging, open source code, detailed documentation, etc. You can look it up right now and do so.

In the 26.2 "infiniband" (RDMA over Thunderbolt) post, I showed a detailed analysis of what the current M3 system is capable of with Thunderbolt 5 cables. Since then, there have been performance improvements on the MLX and EXO layer, so it could be even faster. But it showed that it's fully capable of running a 1 trillion parameter model at decent speed. It's on the third page in that post.

If Apple marshals its efforts -- and they are -- then no doubt they can go even faster.

Reporting even from "The Information" literally talks about Baltra, a custom solution being engineered by Apple and Broadcom for PCC servers. The Verge makes zero mention of this. So they just dropped it or something?

Technical imprecision from these sites is bad enough, but it doesn't even make sense. How could a privacy hardware enforced server be compatible with something like Google, which isn't going to give out their trade secrets for analysis on their TPU servers?

I bring it back to Tim Cook. He directly said that Siri and Apple Intelligence will run on PCC servers on an earnings call. This was in response to questions about the partnership. Apple even said it again outside of an earnings call too. He didn't even entertain it. No hedging. No, "well Google has some powerful servers so we're looking at that." No, "we're doing the new Siri on TPU technology." None of this.

Analyst:
Hey, Tim. First question is on Google partnership again. I wanted to understand how you came to that decision with regard to the AI and Siri in particular and if there’s an opportunity for you guys to share in revenue too with that partnership like you do in search.

Tim Cook:
Yeah, we basically determined that Google’s AI technology would provide the most capable foundation for AFM (Apple Foundation Models), and we believe that we can unlock a lot of experiences and innovate in a key way due to the collaboration. We’ll continue to run on the device and run in Private Cloud Compute and maintain our industry-leading privacy standards in doing so. In terms of the arrangement with Google, we’re not releasing the details of that.

Outside the earnings call:
"We're not changing our privacy rules," Cook's on-air comment read. "We still have the same architecture that we announced before, which is on device plus Private Cloud Compute."

It's outlandish to me for every reason. Impossible? No. Outlandish? Absolutely yes

Sources:



 
Last edited:
Contrast with other rumors today that Apple is only using 10% of its AI server capacity.
That's the same "report," by the way, which adds to its nonsense.

Also reading more articles paraphrasing this, apparently they're citing "former Apple employees." Lmfao.

No seriously , what is this? Even Appleinsider's article didn't analyze this like I just did.

It appears no one has a clue about PCC and what it is precisely is, and journalists prey on that.
 
Last edited:
Contrast with other rumors today that Apple is only using 10% of its AI server capacity.

Second, how can it be both underutilized and underpowered? Something smells there. I expand on this more below in the third point!

Does there have to be a contradiction? To analyze the problem we need to understand a) what is the current and projected compute capacity of the system, b) what is the system currently used for, and c) what will the need be once the new Siri and features launch. We do know quite a lot about PCC architecture, because Apple has published detailed of documentation. What do we know about a) and b)? Not much, to be honest. We know that some Apple LLM requests are routed to PCC (Xcode/text processing), that the cloud model used is not very large by modern standards, and that these features are not used very actively by the users. It is also possible that Siri currently runs on PCC (is it confirmed?), and we also know that current Siri needs less compute than a large LLM. We can at least estimate something about c) — large LLMs require a lot of compute and memory bandwidth, and M2/M3 Ultras (alleged backbone of PCC) have neither. For example, an GH200 offers 4x increase in bandwidth and 10x-40x increase in matrix compute compared to an M3 Ultra. So unless Apple uses modified SoCs that include matrix accelerators (and even if they do), the compute density of PPC is going to be considerably lower than an Nvidia solution. Add to this the relatively low production capacity — Apple needs a lot of time to stockpile the chips and build servers due to manufacturing constraints and costs — no wonder they started with the PPC project way before they needed the compute.

Adding all these factors together, I can totally see how the system would be designed for future need (hence "10% current usage") and yet might fail to achieve the needed capacity at scale, especially now that Appel is pivoting to using Google's foundational models. After all, when PCC project was initiated, they might have been working with different projections, and these are not things one can change overnight.
 
Last edited:
Does there have to be a contradiction? To analyze the problem we need to understand a) what is the current and projected compute capacity of the system, b) what is the system currently used for, and c) what will the need be once the new Siri and features launch. We do know quite a lot about PCC architecture, because Apple has published detailed of documentation. What do we know about a) and b)? Not much, to be honest. We know that some Apple LLM requests are routed to PCC (Xcode/text processing), that the cloud model used is not very large by modern standards, and that these features are not used very actively by the users. It is also possible that Siri currently runs on PCC (is it confirmed?), and we also know that current Siri needs less compute than a large LLM. We can at least estimate something about c) — large LLMs require a lot of compute and memory bandwidth, and M2/M3 Ultras (alleged backbone of PCC) have neither. For example, an GH200 offers 4x increase in bandwidth and 10x-40x increase in matrix compute compared to an M3 Ultra. So unless Apple uses modified SoCs that include matrix accelerators (and even if they do), the compute density of PPC is going to be considerably lower than an Nvidia solution. Add to this the relatively low production capacity — Apple needs a lot of time to stockpile the chips and build servers due to manufacturing constraints and costs — no wonder they started with the PPC project way before they needed the compute.

Adding all these factors together, I can totally see how the system would be designed for future need (hence "10% current usage") and yet might fail to achieve the needed capacity at scale, especially now that Appel is pivoting to using Google's foundational models. After all, when PCC project was initiated, they might have been working with different projections, and these are not things one can change overnight.
1) NVIDIA is cool, but not being considered for PCC so performance regardless of good/bad is irrelevant.

2) M2 is the only chip other than M5 base chip now being run in PCC; it's not speculation (look up the docs).

3) M3 can prefill a MOE 480B/35A at 600 tokens/s at 32K context size using 4 Macs RDMA Thunderbolt.

4) M3 can decode a MOE 1T/32A at 28 tokens/s 4 Macs RDMA Thunderbolt.

5) Gemini 3 Flash/Pro is rumored 1.2T/15A, and 3T/30A respectively.

6) Given 3 and 4 points, I don't see why a cluster of 8 Mac ensemble using M5 highest end over Thunderbolt 5 can't accomplish either model, since M3 can, and M5 is likely 5X faster prefill too

7) Google or nVIDIA cannot match the privacy of PCC. It's impossible without them giving out their source code and IP to Apple.

8) ChatGPT 5.2 Pro runs at similar speed to M3 on 1T/32A MOE. This speed is acceptable to consumers for most stuff but maybe not acceptable to Apple’s standards. It's possible though

9) MOE models perform like active parameters, not total parameter's, if you can fit it into memory 100%, not an issue for PCC

10) given all those points and previous points like Tim Cook literally saying they're going to be using PCC on interviews and earnings calls, I don't see why anyone should even think about Gurman's BS

11) Also Xcode doesn't use any foundation model developed by them except for prediction and that's on device. All others are third party chatbot companies
 
Last edited:
Back
Top