Apple may use Google’s servers for AI

Cmaier · Mar 2, 2026

Apple might use Google servers to store data for its upgraded AI Siri

The new Siri may rely on Google’s cloud.

www.theverge.com

Guess we’ll see…

leman · Mar 2, 2026

This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.

Cmaier · Mar 2, 2026

leman said:
This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.

Contrast with other rumors today that Apple is only using 10% of its AI server capacity.

RockRock8 · Mar 2, 2026

leman said:
This is a news item I don’t find outlandish. While Apples private cloud is an amazing initiative, they simply don’t have the hardware to run large models at scale. The difference to modern Nvidia inference boxes is simply too significant.

How, after reading my well-sourced multiple posts on this matter, do you find it "not outlandish?"

Sorry for being so direct, but I have put in considerable effort into dispelling this entire notion, and while that doesn't mean I'm right definitively, this news should be met with utmost skepticism.

It is the definition of outlandish.

First, and it must be stated again, Tim Cook directly said on an earnings call that Siri and Apple Intelligence will continue to use PCC servers. This was in response to questions about Google's partnership, specifically with their cloud tech.

Second, how can it be both underutilized and underpowered? Something smells there. I expand on this more below in the third point!

Third, it's not clear what exactly the report is precisely saying, because I've read multiple websites and they're saying different versions. Let's assume for the moment The Verge is accurate in its paraphrase of the article:

Private Cloud Compute isn't Apple Silicon with a privacy policy promise. It's a verifiable, hardware enforced, precisely engineered system that uses Apple silicon to verify from the Secure Enclave up that the PCC server is authentic, is running the same software that Apple publicly gives to researchers, and has no tampering. Actually there are a *bunch* of claims they make privacy wise and they need to verify all of them. It only works because it's all 1 solution: Apple silicon, software, frameworks, every level is built for THIS specific thing

I find it utterly perplexing and frustrating that despite the very cool nature of PCC, and the very intelligent and capable users on this site, that no one has bothered to even look at it and analyze it. There is so much to analyze and learn about. When else does Apple literally give you all of the resources they do to learn about PCC? Imaging, open source code, detailed documentation, etc. You can look it up right now and do so.

In the 26.2 "infiniband" (RDMA over Thunderbolt) post, I showed a detailed analysis of what the current M3 system is capable of with Thunderbolt 5 cables. Since then, there have been performance improvements on the MLX and EXO layer, so it could be even faster. But it showed that it's fully capable of running a 1 trillion parameter model at decent speed. It's on the third page in that post.

If Apple marshals its efforts -- and they are -- then no doubt they can go even faster.

Reporting even from "The Information" literally talks about Baltra, a custom solution being engineered by Apple and Broadcom for PCC servers. The Verge makes zero mention of this. So they just dropped it or something?

Technical imprecision from these sites is bad enough, but it doesn't even make sense. How could a privacy hardware enforced server be compatible with something like Google, which isn't going to give out their trade secrets for analysis on their TPU servers?

I bring it back to Tim Cook. He directly said that Siri and Apple Intelligence will run on PCC servers on an earnings call. This was in response to questions about the partnership. Apple even said it again outside of an earnings call too. He didn't even entertain it. No hedging. No, "well Google has some powerful servers so we're looking at that." No, "we're doing the new Siri on TPU technology." None of this.

Analyst:

Hey, Tim. First question is on Google partnership again. I wanted to understand how you came to that decision with regard to the AI and Siri in particular and if there’s an opportunity for you guys to share in revenue too with that partnership like you do in search.

Tim Cook:

Yeah, we basically determined that Google’s AI technology would provide the most capable foundation for AFM (Apple Foundation Models), and we believe that we can unlock a lot of experiences and innovate in a key way due to the collaboration. We’ll continue to run on the device and run in Private Cloud Compute and maintain our industry-leading privacy standards in doing so. In terms of the arrangement with Google, we’re not releasing the details of that.

Outside the earnings call:

"We're not changing our privacy rules," Cook's on-air comment read. "We still have the same architecture that we announced before, which is on device plus Private Cloud Compute."

It's outlandish to me for every reason. Impossible? No. Outlandish? Absolutely yes

Sources:

This is Tim: Complete transcript of Apple’s Q1 2026 financial call

Every quarter after releasing financial results, Apple CEO Tim Cook and CFO Kevan Parekh hop on a conference call with analysts to detail the quarter gone by, give a peek at what’s to come, and may…

sixcolors.com

Tim Cook: Apple won't change privacy rules with Google Gemini partnership

It has already been said repeatedly, but Apple CEO Tim Cook has once again confirmed that Apple Intelligence will still be on-device and in Private Cloud Compute in spite of the Google partnership.

appleinsider.com

Documentation

security.apple.com

RockRock8 · Mar 2, 2026

Cmaier said:
Contrast with other rumors today that Apple is only using 10% of its AI server capacity.

That's the same "report," by the way, which adds to its nonsense.

Also reading more articles paraphrasing this, apparently they're citing "former Apple employees." Lmfao.

No seriously , what is this? Even Appleinsider's article didn't analyze this like I just did.

It appears no one has a clue about PCC and what it is precisely is, and journalists prey on that.

leman · Mar 3, 2026

Cmaier said:
Contrast with other rumors today that Apple is only using 10% of its AI server capacity.

RockRock8 said:
Second, how can it be both underutilized and underpowered? Something smells there. I expand on this more below in the third point!

Does there have to be a contradiction? To analyze the problem we need to understand a) what is the current and projected compute capacity of the system, b) what is the system currently used for, and c) what will the need be once the new Siri and features launch. We do know quite a lot about PCC architecture, because Apple has published detailed of documentation. What do we know about a) and b)? Not much, to be honest. We know that some Apple LLM requests are routed to PCC (Xcode/text processing), that the cloud model used is not very large by modern standards, and that these features are not used very actively by the users. It is also possible that Siri currently runs on PCC (is it confirmed?), and we also know that current Siri needs less compute than a large LLM. We can at least estimate something about c) — large LLMs require a lot of compute and memory bandwidth, and M2/M3 Ultras (alleged backbone of PCC) have neither. For example, an GH200 offers 4x increase in bandwidth and 10x-40x increase in matrix compute compared to an M3 Ultra. So unless Apple uses modified SoCs that include matrix accelerators (and even if they do), the compute density of PPC is going to be considerably lower than an Nvidia solution. Add to this the relatively low production capacity — Apple needs a lot of time to stockpile the chips and build servers due to manufacturing constraints and costs — no wonder they started with the PPC project way before they needed the compute.

Adding all these factors together, I can totally see how the system would be designed for future need (hence "10% current usage") and yet might fail to achieve the needed capacity at scale, especially now that Appel is pivoting to using Google's foundational models. After all, when PCC project was initiated, they might have been working with different projections, and these are not things one can change overnight.

RockRock8 · Mar 9, 2026

leman said:
Does there have to be a contradiction? To analyze the problem we need to understand a) what is the current and projected compute capacity of the system, b) what is the system currently used for, and c) what will the need be once the new Siri and features launch. We do know quite a lot about PCC architecture, because Apple has published detailed of documentation. What do we know about a) and b)? Not much, to be honest. We know that some Apple LLM requests are routed to PCC (Xcode/text processing), that the cloud model used is not very large by modern standards, and that these features are not used very actively by the users. It is also possible that Siri currently runs on PCC (is it confirmed?), and we also know that current Siri needs less compute than a large LLM. We can at least estimate something about c) — large LLMs require a lot of compute and memory bandwidth, and M2/M3 Ultras (alleged backbone of PCC) have neither. For example, an GH200 offers 4x increase in bandwidth and 10x-40x increase in matrix compute compared to an M3 Ultra. So unless Apple uses modified SoCs that include matrix accelerators (and even if they do), the compute density of PPC is going to be considerably lower than an Nvidia solution. Add to this the relatively low production capacity — Apple needs a lot of time to stockpile the chips and build servers due to manufacturing constraints and costs — no wonder they started with the PPC project way before they needed the compute.

Adding all these factors together, I can totally see how the system would be designed for future need (hence "10% current usage") and yet might fail to achieve the needed capacity at scale, especially now that Appel is pivoting to using Google's foundational models. After all, when PCC project was initiated, they might have been working with different projections, and these are not things one can change overnight.

1) NVIDIA is cool, but not being considered for PCC so performance regardless of good/bad is irrelevant.

2) M2 is the only chip other than M5 base chip now being run in PCC; it's not speculation (look up the docs).

3) M3 can prefill a MOE 480B/35A at 600 tokens/s at 32K context size using 4 Macs RDMA Thunderbolt.

4) M3 can decode a MOE 1T/32A at 28 tokens/s 4 Macs RDMA Thunderbolt.

5) Gemini 3 Flash/Pro is rumored 1.2T/15A, and 3T/30A respectively.

6) Given 3 and 4 points, I don't see why a cluster of 8 Mac ensemble using M5 highest end over Thunderbolt 5 can't accomplish either model, since M3 can, and M5 is likely 5X faster prefill too

7) Google or nVIDIA cannot match the privacy of PCC. It's impossible without them giving out their source code and IP to Apple.

8) ChatGPT 5.2 Pro runs at similar speed to M3 on 1T/32A MOE. This speed is acceptable to consumers for most stuff but maybe not acceptable to Apple’s standards. It's possible though

9) MOE models perform like active parameters, not total parameter's, if you can fit it into memory 100%, not an issue for PCC

10) given all those points and previous points like Tim Cook literally saying they're going to be using PCC on interviews and earnings calls, I don't see why anyone should even think about Gurman's BS

11) Also Xcode doesn't use any foundation model developed by them except for prediction and that's on device. All others are third party chatbot companies

RockRock8 · Mar 10, 2026

@leman

Follow up to comment:

The data is now available, and hopefully it's accurate. But we now see M5 Max generally performing about 15% average higher decode speed vs M3 with 80 cores and 800GB/s bandwidth for MoE models, which most models including Google's and Apple's. This was surprising to me, because usually decode is only affected by bandwidth speed. But in this case, the GPU cores are so strong that it actually is faster than a desktop 80 core M3 for MoE models (there is a technical reason for this being faster in MoE rather than dense).

This bodes pretty well for a high end M5 with 80 cores and using 8 of them for an Ensemble for PCC, to say the least.

leman · Mar 10, 2026

@RockRock8 The question is whether Apple can build enough of these chips to satisfy PCC demands. I don't think that anyone except insiders has an answer.

Yoused · Mar 10, 2026

Even an iPhone can do a lot of the Siri back-end without having to pester the server, and the M5 Max seems to be a capable local LM processor. I can imaging Apple might me offloading a lot of this workload to local devices, to minimize its server requirements. I would be curious about how models could exchange information without having to transfer large arrays of tokens.

I mean, what they do is called "inference" – with adequate bases of data, one model could infer the content of another model with minimal information. It would be an interesting avenue of study.

RockRock8 · Mar 12, 2026

leman said:
@RockRock8 The question is whether Apple can build enough of these chips to satisfy PCC demands. I don't think that anyone except insiders has an answer.

I don't anyone except employees have an answer, and executives more so at Apple than individual engineers. Tim Cook said they're shipping ahead of schedule. I agree they need to build the capacity, but it seems like they're doing that. They made a cool video showing off PCC servers for the first time including a bit of the assembly.

RockRock8 · Mar 12, 2026

Yoused said:
Even an iPhone can do a lot of the Siri back-end without having to pester the server, and the M5 Max seems to be a capable local LM processor. I can imaging Apple might me offloading a lot of this workload to local devices, to minimize its server requirements. I would be curious about how models could exchange information without having to transfer large arrays of tokens.

I think if I read stuff correctly they'll have a on device TM for some stuff, with other tasks offloaded to PCC. But I do believe if you want to know how they coordinate across Ensembles it's in the PCC documentation! It's super cool in my opinion

RockRock8 · Mar 12, 2026

People much smarter than me like you @leman and @dada_dave and @mr_roboto can analyze the boards in the images and videos they've shown off, and you guys can analyze the PCC documentation. I'd love to learn more about what they're doing instead of presuming on my own

RockRock8 · Mar 13, 2026

@leman @dada_dave @mr_roboto and anyone else too! I want to hear more about your thoughts about the PCC stuff.

I've compiled the links. If you have the time I'd love to read.

https://www.youtube.com/watch?v=ktFlaBhpMu8
(Has some video of PCC)

https://www.apple.com/newsroom/2026/02/apple-accelerates-us-manufacturing-with-mac-mini-production/
(Has photos and videos of PCC)

Documentation

security.apple.com

(The documentation for PCC)

leman · Mar 13, 2026

Luckily, Apple has provided quite a lot of info in this patent: https://patentscope.wipo.int/search/en/detail.jsf?docId=US469223044&_cid=P22-MMMP4H-82720-1

RockRock8 · Mar 13, 2026

leman said:
Luckily, Apple has provided quite a lot of info in this patent: https://patentscope.wipo.int/search/en/detail.jsf?docId=US469223044&_cid=P22-MMMP4H-82720-1

Yes this is cool, but have you checked out their PCC documentation yet? I feel it's more grounded in what they're doing concretely, and still explains in depth. What do you think?

dada_dave · Mar 13, 2026

RockRock8 said:
@leman @dada_dave @mr_roboto and anyone else too! I want to hear more about your thoughts about the PCC stuff.

I've compiled the links. If you have the time I'd love to read.

https://www.youtube.com/watch?v=ktFlaBhpMu8
(Has some video of PCC)

https://www.apple.com/newsroom/2026/02/apple-accelerates-us-manufacturing-with-mac-mini-production/
(Has photos and videos of PCC)

Documentation

security.apple.com

(The documentation for PCC)

Honored to be included in this list but if I'm being honest @mr_roboto and @leman are both way out of my league. (as are many others here) Sadly the full depth of my expertise these days just feels like plugging numbers others have generated into spreadsheets and counting pixels from die shots other people have annotated. The extent of my thoughts on PCC is that I think it's the right approach while useful models can't fit locally - i.e. Apple's approach is the right one here. There's a lot I'd like to learn about that side of things, but I lack the time, or, more accurately, the mental bandwidth to do so.

leman · Mar 14, 2026

RockRock8 said:
Yes this is cool, but have you checked out their PCC documentation yet? I feel it's more grounded in what they're doing concretely, and still explains in depth. What do you think?

I think it’s very cool, and frankly, that’s exactly the kind of stuff I’d love to work on professionally. I love fancy infra stuff!

That said, I have no background in computer security and can’t provide an informed commentary.

Apple may use Google’s servers for AI

Site Master

Elite Member

Site Master

Site Champ

Site Champ

Elite Member

Site Champ

Site Champ

Elite Member

up

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Site Champ

Elite Member

Elite Member

Similar threads