Apple M6 rumors/discussion

NotEntirelyConfused

Power User
Joined
May 15, 2024
Posts
167
Shockingly, we don't seem to have a thread for this yet, so let me start off with this.

[Note: Edited later because I was dumb and referred to channels when I was counting 64-bit paths, which is relevant neither to LPDDR5 nor recent Apple Silicon chips, which all use LPDDR5. Sad thing is I knew that already and was just being lazy.]

As I understand it, WMCM (as rumored to be used by future Apple chips like the A20 and M6), connects dies top-to-bottom, which means that shoreline (aka beachfront) is no longer a limiting factor for off-die IO, though lots of TSVs will presumably put some pressure on yield.

Therefore, using WMCM, Apple can use wider memory busses than previously practical. However this is limited at the low end by the number of RAM dies they want to use- they have to have a sufficiently low memory config for their baseline product, and nobody thinks they're going to go to, say, 24GB minimum during the RAM supply crunch. So if Apple is consistent with past lineups, we're likely to see 192 bits wide memory on the base M6 (18GB minimum), 384 on the Pro (24 or 36GB minimum, depending on whether they use 4 or 6GB chips), 768 on the Max, and 1536 (!!) on the Ultra, if they ship that chip.

Of course it's possible they could go to 256 bits wide for the base chip, and only populate 192 of them for the base memory config, but that seems somewhat unlikely. Though perhaps not totally - the memory controllers are not wedded to individual CPU complexes, as they are with certain other architectures, so you wouldn't get totally unbalanced performance from some CPUs or GPUs because of the idle controller.

Channels will obviously still be limited by SoC layout - you need to put the memory near the memory controllers on the SoC- but 768 bits wide on a Max die should easily be practical, as memory dies do not have to sit entirely within the footprint of the SoC die.

Is this correct or am I missing something?
 
Last edited:
Last edited:
Shockingly, we don't seem to have a thread for this yet, so let me start off with this.

As I understand it, WMCM (as rumored to be used by future Apple chips like the A20 and M6), connects dies top-to-bottom, which means that shoreline (aka beachfront) is no longer a limiting factor for off-die IO, though lots of TSVs will presumably put some pressure on yield.

Therefore, using WMCM, Apple can use wider memory busses than previously practical. However this is limited at the low end by the number of RAM dies they want to use- they have to have a sufficiently low memory config for their baseline product, and nobody thinks they're going to go to, say, 24GB minimum during the RAM supply crunch. So if Apple is consistent with past lineups, we're likely to see 3 channels on the base M6 (18GB minimum), 6 on the Pro (24 or 36GB minimum, depending on whether they use 4 or 6GB chips), 12 on the Max, and 24 (!!) on the Ultra, if they ship that chip.
Kind of a side point but it seems like you're defining a memory channel as 64 bits wide? Fyi, LPDDR5 channels are only 16 bits wide. If M6 maintains the traditional 128-bit path to DRAM, it's 8-channel, if it jumps up to 192 it's 12-channel. M4 Max is already 32-channel.

LPDDRn uses narrow channels to increase the number of independent memory controllers for a given memory interface width. This arose due to the needs of high performance cellphone SoCs, where there's enough independent subsystems generating memory accesses that it's a big win to have more channel controllers and therefore more memory commands in flight.

This demand for more command parallelism applies to today's many-core desktop CPUs, too. Desktop DDR5 has narrowed channel width to 32 bits. DDR5 DIMMs are 64 bits wide, but that's two 32-bit channels packed into one module.
 
Interesting. I remember the earlier rumors were that anything below the Pro was going to stay monolithic at least through the M6. Did I miss a rumor that Apple was going to make the A-series and base-M chips multi-die as well?

Edit: ah I see the link @exoticspice1 posted. So all of them are getting an IO die?
 
Kind of a side point but it seems like you're defining a memory channel as 64 bits wide? Fyi, LPDDR5 channels are only 16 bits wide. If M6 maintains the traditional 128-bit path to DRAM, it's 8-channel, if it jumps up to 192 it's 12-channel. M4 Max is already 32-channel.

LPDDRn uses narrow channels to increase the number of independent memory controllers for a given memory interface width. This arose due to the needs of high performance cellphone SoCs, where there's enough independent subsystems generating memory accesses that it's a big win to have more channel controllers and therefore more memory commands in flight.

This demand for more command parallelism applies to today's many-core desktop CPUs, too. Desktop DDR5 has narrowed channel width to 32 bits. DDR5 DIMMs are 64 bits wide, but that's two 32-bit channels packed into one module.
Aaaaargh. I knew that and was just too damn lazy, using channels to represent 64 bits. I should have known better. :-( I edited the OP to be less stupid.

So, restated in bits, does what I wrote seem correct?

Obviously, we can't know what Apple will do, but this does seem like a reasonable path forward if you're looking to improve performance (especially for Graphics & AI) while also improving power and space consumption. At 768 bits wide for the Max, you're well over 1.5TBps, which is pushing you into territory previously reserved for HBM. That's substantially higher than current EPYC or XEON servers can manage, and in the ballpark (more or less depending on RAM clock rate) of the nVidia 5090.

I'll bet Apple could push the Max to 1024 bits wide, but that seems excessive (and they could build an Ultra and get 1536 wide, if they want that much bandwidth). Leave that to the M7 or M8. :-)

BTW, you'd likely see low-memory Max configurations with substantially less memory bandwidth, if memory I/O gets to 768 bits wide. You can't get chips small enough to fill all the channels with less than 48GB. They might reasonably drop channels for anything less than 72GB, even.
 
Last edited:
Interesting. I remember the earlier rumors were that anything below the Pro was going to stay monolithic at least through the M6. Did I miss a rumor that Apple was going to make the A-series and base-M chips multi-die as well?

Edit: ah I see the link @exoticspice1 posted. So all of them are getting an IO die?
I'm making no assumptions about what Apple will do in that respect. Regardless, if they're doing 3D advanced packaging (WMCM) they can mount RAM directly on the SoC, using TSVs to connect them. That means no restrictions on bus width due to shoreline, which is really what I was interested in at the moment - although, as I said, yield issues may limit the number of TSVs they want to use. I don't really know about that.
 
I'm making no assumptions about what Apple will do in that respect. Regardless, if they're doing 3D advanced packaging (WMCM) they can mount RAM directly on the SoC, using TSVs to connect them. That means no restrictions on bus width due to shoreline, which is really what I was interested in at the moment - although, as I said, yield issues may limit the number of TSVs they want to use. I don't really know about that.
Sounds similar to the custom RAM job for the R1 chip Apple did in the Vision Pro. Would WMCM allow them to do the same and increase pin count? If so, then you wouldn’t need to even increase channel count then to increase bandwidth. They were able the double the effective bandwidth of the modules by quadrupling the pin count and halving speed if I remember right. Probably saving a good deal of power too. That would solve the minimum RAM problem too. One the other hand, for something that has to also serve the CPU it’s possible that would not be ideal (latency might take a hit), but that could be mitigated with larger cache sizes for a lot of workloads.
 
keep on eye on the A20 Pro in iPhone 18 Pro, the iPhone 18 should have at least 115.2GB/s memory bandwidth if they move to 6-channel as per Weibo leak.

Current A Pro series - 4 channel 64 bit

A20 Pro - 6 channel 96-bit
If the same applies to M-series then we have
M6 - 12 channel 192-bit
M6 Pro - 24 channel 384-bit
M6 Max - 48 channel 768-bit


I hope this rumour is actually true, this would provide greater bandwidth improvements than just moving to 10700MT/s LPDDR5X with the same memory channels we have now.
 
I am trying to understand what exactly WMCM and which advantages it brings. So far, I was unable to find any concrete information. Most mentions contrast WMCM with InFo packaging, but as far as I understand InFo is just the technology used to mount the DRAM "on top" of the A-series chips to reduce footprint. If anyone has a more clear explanation what is the actual deal with WMCM, I'd be very interested.
 
I am trying to understand what exactly WMCM and which advantages it brings. So far, I was unable to find any concrete information. Most mentions contrast WMCM with InFo packaging, but as far as I understand InFo is just the technology used to mount the DRAM "on top" of the A-series chips to reduce footprint. If anyone has a more clear explanation what is the actual deal with WMCM, I'd be very interested.
I'm going to avoid much detail because I don't know that much about this and I can't quickly identify any reliable sources for what I think I've heard.

My understanding is that this is a next-gen 3D packaging tech. You can mount dies directly on other dies (FSVO "directly") and connect them with TSVs. You can connect dies side-by-side without interposers (I'm assuming, only on the bottom layer?). And physical package structure is better because of the way they fill gaps. Most of all, this is practical for small and relatively cheap chips, not just giant overpriced reticle-sized AI monsters.

One thing I have no idea about - can you make multiple-reticle-sized things like Mx Ultras with this tech?
 
My understanding is that this is a next-gen 3D packaging tech. You can mount dies directly on other dies (FSVO "directly") and connect them with TSVs. You can connect dies side-by-side without interposers (I'm assuming, only on the bottom layer?). And physical package structure is better because of the way they fill gaps. Most of all, this is practical for small and relatively cheap chips, not just giant overpriced reticle-sized AI monsters.

At the same time this post: https://www.linkedin.com/posts/ragh...wmcm-apple-activity-7419598374955479041-Z62l/ seems to suggest that WMCM mounts dies side by side, and not vertically. In fact, most of the information I have seen so far goes in the direction of WMCM being a technology of combining a heterogeneous 2D die arrangement directly onto a wafer and dicing the wafer afterwards (as opposed to dicing the individual interposers and mounting the dies on top). So to me this really sounds like a way to decrease costs more than anything else (and maybe also achieve more dense wiring, not sure about this). Which is why I am confused how this technology is supposed to replace the vertical stacking of the SoC and the DRAM for the iPhone — wouldn't the total package area be larger this way?

Given the information I have seen so far, the only way how all of this would make sense to me if both DRAM and the SoC became smaller instead, e.g. by becoming 3D dies. Then these
thicker" dies could be mounted side-by-side using WMCM. I could see advantages to that.
 
Aaaaargh. I knew that and was just too damn lazy, using channels to represent 64 bits. I should have known better. :-( I edited the OP to be less stupid.

So, restated in bits, does what I wrote seem correct?

Obviously, we can't know what Apple will do, but this does seem like a reasonable path forward if you're looking to improve performance (especially for Graphics & AI) while also improving power and space consumption.
I don't know what to make of the story, there's not enough substance in it to really say much.

Re: power and space consumption, this could hurt both, actually. Especially if it's delivered alongside a frequency increase. More memory bus width/speed is always more power, if all else is equal. If these reports are true, perhaps they're making up for it with gains elsewhere. Power is a huge concern for iOS devices, thanks to Apple's preference for trying to hit high battery life numbers while keeping the phones as slim and light as possible.
 
At the same time this post: https://www.linkedin.com/posts/ragh...wmcm-apple-activity-7419598374955479041-Z62l/ seems to suggest that WMCM mounts dies side by side, and not vertically. In fact, most of the information I have seen so far goes in the direction of WMCM being a technology of combining a heterogeneous 2D die arrangement directly onto a wafer and dicing the wafer afterwards (as opposed to dicing the individual interposers and mounting the dies on top). So to me this really sounds like a way to decrease costs more than anything else (and maybe also achieve more dense wiring, not sure about this). Which is why I am confused how this technology is supposed to replace the vertical stacking of the SoC and the DRAM for the iPhone — wouldn't the total package area be larger this way?

Given the information I have seen so far, the only way how all of this would make sense to me if both DRAM and the SoC became smaller instead, e.g. by becoming 3D dies. Then these
thicker" dies could be mounted side-by-side using WMCM. I could see advantages to that.
yeah, right now the M’s are side-by-side and the A’s are stacked. The problem with stacking is heat dissipation. I also doubt A-series goes to WMCM, at least for the RAM. I suppose they could use it to integrate die, then stack the WMCM with RAM using the same technology they currently use. The horizontal growth from using an MCM vs integrating everything on a single die would be pretty small. But the memories can’t be laterally connected in an iphone, as the total area would be huge.

I’m not finding a lot of information about how WMCM differs from what they are already doing (which I know very well).
 
yeah, right now the M’s are side-by-side and the A’s are stacked. The problem with stacking is heat dissipation. I also doubt A-series goes to WMCM, at least for the RAM. I suppose they could use it to integrate die, then stack the WMCM with RAM using the same technology they currently use. The horizontal growth from using an MCM vs integrating everything on a single die would be pretty small. But the memories can’t be laterally connected in an iphone, as the total area would be huge.

I’m not finding a lot of information about how WMCM differs from what they are already doing (which I know very well).

That’s also why I am confused - the rumor mill so far has been adamant that WMCM will be used for A-series chips, and I just don’t understand how it would work out.
 
That’s also why I am confused - the rumor mill so far has been adamant that WMCM will be used for A-series chips, and I just don’t understand how it would work out.
Only thing I can think of is that maybe they save on some heat exchanger real estate by doing it, so it ends up being a win? Or the rumors are just wrong.
 
Only thing I can think of is that maybe they save on some heat exchanger real estate by doing it, so it ends up being a win? Or the rumors are just wrong.
They probably saved some real estate by integrating cellular capabilities into the A series SoCs, freeing it up for RAM?
 
Isn’t that on a separate chip? C1 or whatever?
Yeah, so far N1 and C1/C1x are separate.

Also, like most cellular modems C1 is composed of two die, a TSMC 4nm baseband processor (the digital domain chip) and a TSMC 7nm RF chip (the one where there's lots of analog and ADC+DAC). Even if Apple decides to eventually integrate the baseband processor into the main SoC, I'd expect them to retain a separate RF chip.
 
At the same time this post: https://www.linkedin.com/posts/ragh...wmcm-apple-activity-7419598374955479041-Z62l/ seems to suggest that WMCM mounts dies side by side, and not vertically. In fact, most of the information I have seen so far goes in the direction of WMCM being a technology of combining a heterogeneous 2D die arrangement directly onto a wafer and dicing the wafer afterwards
That... absolutely makes sense. And we know most rumormongers have very little clue and tend to garble tech details a lot.

Re: power and space consumption, this could hurt both, actually. Especially if it's delivered alongside a frequency increase. More memory bus width/speed is always more power, if all else is equal. If these reports are true, perhaps they're making up for it with gains elsewhere.
Or perhaps they're building a not-so-poor-man's LP version of HBM. With more channels, you can run each channel slower and get the same or better bandwidth. Maybe the real point of what they're doing is to run the memory slower to improve power? I don't know what ther power/perf curve looks like for LPDDR5X RAM, so I dont know if that's even remotely plausible.

Even if Apple decides to eventually integrate the baseband processor into the main SoC, I'd expect them to retain a separate RF chip.
I thought integrating the baseband is the obvious endgame. And why would they retain a separate RF chip? Is there a benefit (less RFI maybe) to keeping the RF chip far away from the processors? Because otherwise, maybe having it as a chiplet integrated in the SoC saves enough power to be interesting.
 
Back
Top