Sept 2025 Apple iPhone Event

I was hoping for 4-way FP16 dot product with 32-wide SIMD, which would match the performance of the 4070 RTX on the 40-core Max variant. But they mention 4x, which sounds more like 2-way dot product. My math could be off completely of course, it’s late and I’m tired from spending my day at the beach 😅

I had a look at this again and actually 4x is consistent with 4-way hardware dot product, just as described in the patent.

If that is FP16 with FP32 accumulate, then we are looking at around 60TFLOPs for M5 Max GPU, same as 4070 RTX/Nvidia Spark. Of course, the latter can deliver much better performance using smaller precision data. Still, wouldn’t be too shabby for the first-gen hardware from Apple.

The big question however is whether it’s indeed FP16.

One would imagine that GB AI GPU should?

Probably, depending on how Apple routes these things internally. Will CoreML model use both ANE and GPU? Who knows.

So flops per watt still important for GPUs even on the high end ... that could be interesting ...

You are always limited by power consumption, so it’s the only figure that matters (in the a sense of other constraints of course). I do disagree with that twitter post that architecture is uninteresting, as architecture is what determines scalability. A d frankly it’s just fun to talk about.
 
You are always limited by power consumption, so it’s the only figure that matters (in the a sense of other constraints of course).

Aye but what I find interesting about the post is that power ratings for the Nvidia GPUs don’t include the matrix cores running full tilt so they have to throttle their clocks when doing matrix multiplication. Now the typical response of a lot of people is that desktop/workstation power efficiency doesn’t matter at all because the desktop/workstation user will always just throw more power at the problem. But @Altaic’s link is indicative of the limitations of that approach even for desktops (never mind that people who run these models constantly do eventually care about their power bills).

I do disagree with that twitter post that architecture is uninteresting, as architecture is what determines scalability. A d frankly it’s just fun to talk about.
Absolutely. And to reinforce your point, architecture helps determine perf per watt.
 
Last edited:
Talking about phones themselves, I am kind of tempted to go for a base 17 model instead of the Pro. The reasons I got the Pro in the past were better screen and battery, but the differences this year appear minimal. The big ones are the telephoto lens and ProRes recording (which I really don’t need) and the extra GPU core (which again I don’t need in a phone). All in all, this seems the most compelling base IPhone in many years. Am I missing something?
 
Talking about phones themselves, I am kind of tempted to go for a base 17 model instead of the Pro. The reasons I got the Pro in the past were better screen and battery, but the differences this year appear minimal. The big ones are the telephoto lens and ProRes recording (which I really don’t need) and the extra GPU core (which again I don’t need in a phone). All in all, this seems the most compelling base IPhone in many years. Am I missing something?
Yeah I don’t blame you. The base is very good value indeed. They are gonna sell so many.
 
Talking about phones themselves, I am kind of tempted to go for a base 17 model instead of the Pro. The reasons I got the Pro in the past were better screen and battery, but the differences this year appear minimal. The big ones are the telephoto lens and ProRes recording (which I really don’t need) and the extra GPU core (which again I don’t need in a phone). All in all, this seems the most compelling base IPhone in many years. Am I missing something?
I think the Air is a good compromise, unless you specifically want the smaller screen. I'm stuck on which color Pro I want to get.
 
I’m quite surprised about the 17 ProMotion in addition to significant battery life improvement and base storage over the 16. Got me thinking about trading my 16 in but that would be a crazy move for me. I always keep a phone for at least 4 years.

Right? The base model is pretty stout, I was thinking maybe a little bump in models (this cycle) for the daughter, but for $799 with 256GB and a lot of previously "pro features" now included, that's a killer value. Apple showing $240 for a trade of her base iPhone 14, so $559 (will do credited-at-purchase by way of 0% payments on the Apple card and get that 3% back too).

While I do like having the best possible camera in the family (which would be a new 17 Pro/Max), the wife is really digging on the Air.

Heck, $520 trade on my 15PM, I see a possible Cosmic Orange Pro Max in my future :D
 
Last edited:
Air is pretty, but I can’t justify the price. If I’m spending that much on a phone, it would be a Pro.

Air is pretty <FULLSTOP> :giggle:

Seriously though, I think it's going to be very popular since it's $100 less than the (smaller) Pro, while offering a decently larger display AND not having the additional bulk of the Pro Max.

My wife likes the larger display, but is often on the go (running, yoga, theme parks) and would prefer the smallest, lightest packaging without giving up too much on the display side. The camera? We love having the best camera on family trips, but even the Air is still so good, it's "good enough" (I'll be curious how it looks vs. a 2 or 3 generation old Pro mode like her 14 or my 15).
 
The big question however is whether it’s indeed FP16.
I came across a post saying FP16 may only be 2x according to something Tim M. said, but I haven’t had the time to check it out. Also, FP8 wasn’t talked about at all, so that would be pretty disappointing. I’ll give it a listen in a few minutes.

Probably, depending on how Apple routes these things internally. Will CoreML model use both ANE and GPU? Who knows.
Maybe I’m just cynical, but I doubt GB has optimized their AI benchmark to fully utilize the ANE and GPU together. I wish they’d open source (perhaps with a strict license) the benchmark code for review. It’s sort of bizarre to me that so many people trust benchmarks implicitly.
 
I came across a post saying FP16 may only be 2x according to something Tim M. said, but I haven’t had the time to check it out. Also, FP8 wasn’t talked about at all, so that would be pretty disappointing. I’ll give it a listen in a few minutes.

I think the 2x FP16 is about the all-purpose shader performance (thanks to the extra math pipe). The matmul performance is a different figure.

Maybe I’m just cynical, but I doubt GB has optimized their AI benchmark to fully utilize the ANE and GPU together.

They don’t have to! The very idea of CoreML is that you ship your model and the system determines the best way to run it. So I’d say for a user-focused benchmark like GB that’s the way to go.

What I’m not sure about is whether Apple will also run these models on the GPU (after all it’s going to cost much more power than ANE).
 
I still have a few iPhones of various models (I had a big collection when I was actively doing iOS dev work), mostly ones that were not worth trading, got too old and/or busted. I bet I have something that's a pretty close size comparison to the new Air, I know I have a 6S+, I'll see if the wife wants to do a test fit :LOL:
 
They don’t have to! The very idea of CoreML is that you ship your model and the system determines the best way to run it. So I’d say for a user-focused benchmark like GB that’s the way to go.

What I’m not sure about is whether Apple will also run these models on the GPU (after all it’s going to cost much more power than ANE).
I’m very tired and very confused, GB already has an AI focused benchmark using CoreML that targets the GPU - are we talking about something else?

 
I’m very tired and very confused, GB already has an AI focused benchmark using CoreML that targets the GPU - are we talking about something else?


My knowledge of CoreML is very basic - is it possible to force the model to be run on the GPU reliably and does GB offer this kind of option? Sorry, can’t check myself - very poor internet here.
 
My knowledge of CoreML is very basic - is it possible to force the model to be run on the GPU reliably and does GB offer this kind of option? Sorry, can’t check myself - very poor internet here.
To my knowledge you can only set a preferred Metal device - it will still delegate between ANE, CPU and said Metal device based on its internal logic, but if you have many GPUs, you can pick one.

What you can do however is create models that are incompatible with current ANE iterations and that will force them off the ANE, but that could change between hardware iterations
 
My knowledge of CoreML is very basic - is it possible to force the model to be run on the GPU reliably and does GB offer this kind of option? Sorry, can’t check myself - very poor internet here.

To my knowledge you can only set a preferred Metal device - it will still delegate between ANE, CPU and said Metal device based on its internal logic, but if you have many GPUs, you can pick one.

What you can do however is create models that are incompatible with current ANE iterations and that will force them off the ANE, but that could change between hardware iterations
Yeah somehow GB runs its CoreML benchmark on each device. Unfortunately they don’t have a whitepaper for it (I don’t think) but it’s possible requests for device are just generally honored by the internal logic except in extreme cases.
 
Yeah somehow GB runs its CoreML benchmark on each device. Unfortunately they don’t have a whitepaper for it (I don’t think) but it’s possible requests for device are just generally honored by the internal logic except in extreme cases.
But to my knowledge you cannot request device. You can only request which of several Metal devices counts as the Metal device in the group of 3 to choose from, CPU, ANE and MetalDevice
 
Looking better.
1757530597846.png
 
Back
Top