M4 Mac Announcements

Speaking of the M4 Ultra, this is behind a paywall ( https://asia.nikkei.com/Business/Te...xconn-to-produce-servers-in-Taiwan-in-AI-push ) but, according to MR's summmary ( https://forums.macrumors.com/thread...s-next-year-after-m2-ultra-this-year.2442148/ ), Apple will be replacing the M2 Ultra with M4 chips (which I assume eventually means the M4 Ultra) in its AI servers.

If true, that means their internal AI server development work will continue; the alternative would be Apple giving up on having its own AI severs, and farming this out to someone like Google.

I'm wondering what the relative volume of M2 Ultra chips going into the AI servers vs. the Macs has been thus far, and how that will change going forward.

I've read reports that, while Apple is trying to develop an LLM that will enable most requests to be processed on-device (this was probably a key part of their decision to increase the base RAM to 16 GB), cloud connectivity will still be required for more demanding requests, hence the need for the AI servers.

...which leads to another interesting question: Will the decision whether to process requests locally or remotely sometimes depend on device capability? E.g., might some requests that would be sent the cloud from a base M4 be processed locally on an M4 Ultra?
 
Last edited:
Speaking of the M4 Ultra, this is behind a paywall ( https://asia.nikkei.com/Business/Te...xconn-to-produce-servers-in-Taiwan-in-AI-push ) but, according to MR's summmary ( https://forums.macrumors.com/thread...s-next-year-after-m2-ultra-this-year.2442148/ ), Apple will be replacing the M2 Ultra with M4 chips (which I assume eventually means the M4 Ultra) in its AI servers.

If true, that means their internal AI server development work will continue; the alternative would be Apple giving up on having its own AI severs, and farming this out to someone like Google.

I'm wondering what the relative volume of M2 Ultra chips going into the AI servers vs. the Macs has been thus far, and how that will change going forward.

I've read reports that, while Apple is trying to develop an LLM that will enable most requests to be processed on-device (this was probably a key part of their decision to increase the base RAM to 16 GB), cloud connectivity will still be required for more demanding requests, hence the need for the AI servers.

...which leads to another interesting question: Will the decision whether to process requests locally or remotely sometimes depend on device capability? E.g., might some requests that would be sent the cloud from a base M4 be processed locally on an M4 Ultra?
I assume the differentiator would be the amount of memory to hold the model, and not the processing capabilities of the chip?
 
I've read reports that, while Apple is trying to develop an LLM that will enable most requests to be processed on-device (this was probably a key part of their decision to increase the base RAM to 16 GB), cloud connectivity will still be required for more demanding requests, hence the need for the AI servers.

...which leads to another interesting question: Will the decision whether to process requests locally or remotely sometimes depend on device capability? E.g., might some requests that would be sent the cloud from a base M4 be processed locally on an M4 Ultra?

A lot of what Apple is doing is with 3B SLMs, which should keep things manageable for the on-device scenarios. While I don’t have much insight on how much RAM these use during inference, I would not be surprised if it is close to 2GB. That depends on how much they can shrink the model using adapters for the different tasks. You don’t really want a feature that needs 25% of your RAM every time you want to summarize a notification or re-tone an email (maybe on iOS you can get away with this), and you likely want the model resident in memory to handle requests on the fly whenever there’s enough memory. That’s more where the RAM bump comes from I think.

So no, I don’t think the M4 Ultra will handle more requests locally. That complicates the engineering in ways which seem very unlikely.
 
No need for AGP these days, the GPU is now integrated to the ASi chip...
Hey, the upgradability was nice. I had one upgraded from 400 MHz PowerPC G4 7400 + Nvidia GeForce 2MX to a 1.8GHz G4 7447A + Nvidia GeForce 6200.

I would think an all-new Mac Pro Cube would basically be a taller variant on the Mac Studio, to allow for the larger cooling subsystem a Mn Extreme chip would require...
I don't think we'll ever get anything as close to the Cube spiritually as the trashcan Mac Pro unfortunately.
 
I assume the differentiator would be the amount of memory to hold the model, and not the processing capabilities of the chip?
Yeah, I suspected that as well when I was thinking about the base M4 vs. the Ultra.
A lot of what Apple is doing is with 3B SLMs, which should keep things manageable for the on-device scenarios. While I don’t have much insight on how much RAM these use during inference, I would not be surprised if it is close to 2GB. That depends on how much they can shrink the model using adapters for the different tasks. You don’t really want a feature that needs 25% of your RAM every time you want to summarize a notification or re-tone an email (maybe on iOS you can get away with this), and you likely want the model resident in memory to handle requests on the fly whenever there’s enough memory. That’s more where the RAM bump comes from I think.
In Dec 2023, Apple engineers published a paper proposing a more efficient way to split LLM parameter storage between DRAM and SSD so they could run larger (14 GB) LLMs on-device on models with limited RAM [1]. So while Apple's production on-device LLMs may be smaller, 2 GB could be an underestimate. I.e., Apple's way to avoid using too much RAM may be SSD caching rather than simply limiting the model size. Of course, Apple publishes a lot of stuff they don't implement, so it's possible they will not do this.

But if they do, then the difference in LLM operation between a large-RAM and a small-RAM device may not be on-device processing vs. sending to the cloud, but rather being able to keep the model resident in RAM vs. having to split it between the RAM and SSD.

"Currently, the standard approach is to load the entire model into DRAM (Dynamic Random Access Memory) for inference (Rajbhandari et al., 2021; Aminabadi et al., 2022). However, this severely limits the maximum model size that can be run. For example, a 7 billion parameter model requires over 14GB of memory just to load the parameters in half-precision floating point format, exceeding the capabilities of most personal devices such as smartphones."

[1] Alizadeh K, Mirzadeh I, Belenko D, Khatamifard K, Cho M, Del Mundo CC, Rastegari M, Farajtabar M. LLM in a flash: Efficient large language model inference with limited memory. arXiv preprint arXiv:2312.11514. 2023 Dec 12.

Link: https://arxiv.org/pdf/2312.11514
 
Last edited:
In Dec 2023, Apple engineers published a paper proposing a more efficient way to split LLM parameter storage between DRAM and SSD so they could run larger (14 GB) LLMs on-device on models with limited RAM [1]. So while Apple's production on-device LLMs may be smaller, 2 GB could be an underestimate. I.e., Apple's way to avoid using too much RAM may be SSD caching rather than simply limiting the model size. Of course, Apple publishes a lot of stuff they don't implement, so it's possible they will not do this.

Even if we don't see this at the user level, this could be useful in datacenter, and may be where Apple is thinking of deploying this (if they haven't already). Think of an Mn Ultra with a handful of these secure VMs running on it. Minimizing RAM usage there means you can host more in parallel on a node and reduce costs.

That said, I'd probably need to see a use case where it makes sense to use the amount of RAM and disk space for an LLM on-device for the end user. The SLMs in iOS 18/macOS 15 are larger in terms of parameters than many of the GPT-3 models as it is (outside of the larger curie and davinci models), and the adapters on top should make them more capable than their parameter count alone suggests.

I'm actually trying to get some folks on my end to look at SLMs for cost reasons in their thinking about features. Right now a lot of it is: "Natural Language? Throw an LLM at it." which makes certain features a lot more expensive than they need to be. Also working with folks to see if we can reduce how much work the model actually has to do before we can pass things off to a more classical algorithm to improve accuracy in some scenarios.
 
On a different topic... nano-texture or no nano-texture? Seems like people are liking it on the laptops and it has me second-guessing my pick.
 
On a different topic... nano-texture or no nano-texture? Seems like people are liking it on the laptops and it has me second-guessing my pick.
everyone seems to like it, and they all downplay the contrast/sharpness downsides, but I think it’s the kind of thing you need to see in person. I’m hoping they stock some on display at Apple Stores so I can see what it looks like.
 
everyone seems to like it, and they all downplay the contrast/sharpness downsides, but I think it’s the kind of thing you need to see in person. I’m hoping they stock some on display at Apple Stores so I can see what it looks like.

Same. What's interesting is that the spec I picked happens to be on the "secret menu" of in-stock CTO builds. So I could very likely get the nano texture version when I go to pick mine up tomorrow.
 
On a different topic... nano-texture or no nano-texture? Seems like people are liking it on the laptops and it has me second-guessing my pick.
I've done a direct A-B comparison of the glossy vs. nano-textured ASD at my local Apple store. I don't know if the nano-texure on the laptops is the same, but here are what I found to be the ASD nanotexture's pros and cons:

Pros: It's a very strong AR treatment, and is excellent at reducing reflections. It enabled photos and videos to look great in the brightly-lit Apple store.

Cons: It's a very strong AR treatment, and thus noticeably reduces text sharpness, and creates a strong 'sparkling snowfield' effect on white backgrounds. Though "noticeably" for me may be less so for someone else, since I'm particularly sensitive to text sharpness. OTOH, since the MBP's have a higher pixel density than the ASD's (254 ppi vs. 218 ppi), the sharpness reduction may be even more noticeable on the MBP's (unless Apple has adjusted the treatment on the MBP's to compensate).
 
I’m pretty sure that for me the glossy would be the way to go and buy a matt magnetic paper like protector that can be put on and off the laptop (similar to what I do with my iPad). Best of both worlds.
I do applaud them for offering both options!
 
Same. What's interesting is that the spec I picked happens to be on the "secret menu" of in-stock CTO builds. So I could very likely get the nano texture version when I go to pick mine up tomorrow.
Just curious, which spec? When I played around with it I wasn't able to find a CTO M4 Max 16cpu/40gpu config that didn't have weeks of delay. Don't think I was exhaustive though.
 
I know y'all are just busting my chops, but I did say all-new...



No need for AGP these days, the GPU is now integrated to the ASi chip...



I would think an all-new Mac Pro Cube would basically be a taller variant on the Mac Studio, to allow for the larger cooling subsystem a Mn Extreme chip would require...
To tide you over until the cube comes out 🙃

 
Last edited:
Just curious, which spec? When I played around with it I wasn't able to find a CTO M4 Max 16cpu/40gpu config that didn't have weeks of delay. Don't think I was exhaustive though.

It’s a couple pages back in the thread, but I’m not getting the Max, which may be part of it:

Welp, I went ahead and ordered my upgrade for the next few years. Interestingly, it says I should be able to grab it day one from the Apple Store near my office. Usually have to wait a bit for these CTO builds.

14" MBP with M4 Pro, 48GB, 2TB
 
Looks like the SSD in the MBP M4 Pro is ~6% faster than the previous generation. I'd hoped they'd have gone for the faster NAND (twice as fast) that's currently on the market, but alas it doesn't look like it.


Also, I'll be receiving a base model Mac mini M4 Pro tomorrow, which I intend to take apart and nondestructively analyze. LMK if there's anything in particular you'd like me to examine (and photograph).
 
Looks like the SSD in the MBP M4 Pro is ~6% faster than the previous generation. I'd hoped they'd have gone for the faster NAND (twice as fast) that's currently on the market, but alas it doesn't look like it.


Also, I'll be receiving a base model Mac mini M4 Pro tomorrow, which I intend to take apart and nondestructively analyze. LMK if there's anything in particular you'd like me to examine (and photograph).

Looking forward to your assessment. Over the next month I'm thinking of purchasing two M4 Minis. One to handle my security video cameras and home automation software that runs 24/7. And then another one as a dedicated X-Plane flight simulator computer that drives three 4k displays. That task is currently shared with my Mac desktop computer that I use for general stuff and photo editing/processing - and is kind of a pain.

I'm still assessing M4/M4Pro, CPU/GPU cores, RAM, and storage needs, which will be different for the above two uses. For both I'm thinking of going base storage and using a fast Samsung external SSD.
 
The Mac mini’s SSD uses a connector!

What a beauty this thing is! I’m so eager to get to my own teardown. I’m particularly interested in the board-to-board interconnects and how the layout is split between the two mainboards.


elPb1sp5bvSAiixJ.full
 
Last edited:
The Mac mini’s SSD uses a connector!

What a beauty this thing is! I’m so eager to get to my own teardown. I’m particularly interested in the board-to-board interconnects and how the layout is split between the two mainboards.


elPb1sp5bvSAiixJ.full
BTW, while I haven’t confirmed it with the person who owns this (perhaps gnattu at MR or kianweelim at iFixit), it appears to be the base M4 model. The NAND chip is marked “128G”, so I’d imagine the other side features the same, for a total of 256GB; the minimum configuration of the M4 Pro is 512 GB, and likely has a different layout. It’s possible that the M4 Pro would have two such SSD cards, which would imply double the throughput (as has been the case with the Mac Studios).

Edit: the source of the images:
 
Last edited:
Back
Top