M4 Mac Announcements

However, I would disagree that we should only focus on those for relative GPU performance.
I never said we should be looking only at rendering benchmarks, but rather that our focus shouldn't be primarily on GB6 Compute. I.e., it should be more balanced across both compute and rendering benchmarks. From what I recall of the M3 performance thread, the GPU performance discussion was mostly focused on GB6 compute.

I read that the portion of 3D Mark that can run natively on AS is synthetic rather than a real-world task. Is that not correct?

And I don't think Blender or Cinebench qualify according to my criteria, since those run just a single app. I'm looking for something that tests a representative set of real-world tasks, i.e., that does for GPU rendering what GB6 CPU does for the CPU.
 
Last edited:
You misread my post. I didn't say we should be looking only at rendering benchmarks, but rather that our focus shouldn't be primarily on GB6 Compute. I.e., it should be more balanced across both compute and rendering benchmarks.
I dunno I guess for me it’s the opposite, there so many rendering benchmarks that people use regularly but only one GPU compute. Generally people focus almost exclusively on rendering for GPU performance with GeekBench Compute and occasionally PugetBench being exceptions.
I read that the portion of 3D Mark that can run natively on AS is synthetic rather than a real-world task. Is that not the correct?
To be honest, a lot of times people use “synthetic” as a pejorative to mean a benchmark they don’t like. A good example of truly synthetic benchmarks are some of the Passmark subtests. I wrote about that earlier in a different thread. Others are pretty explicitly written about as synthetics like designed to do nothing but unreasonably hammer the CPU/GPU or measure TFLOPs under load. That sort of thing. And even some of those could in principle have relevant data. Sometimes a benchmark can also just be old and outdated where modern processors handle it so easily that isn’t simply isn’t relevant anymore.

I did misstate in my earlier post talking about Steel Nomad and 3D Mark with the latter meant to be the older 3D Mark benches since Steel Nomad is one of their newer ones (basically I made it sound like Steel Nomad was not a 3D Mark bench). I believe the Lite version is the one that is AS native, but even the light version is pretty punishing. I would not describe it as synthetic. There are other older ones that are AS silicon native but some are not.
 
Last edited:
I dunno I guess for me it’s the opposite, there so many rendering benchmarks that people use regularly but only one GPU compute. Generally people focus almost exclusively on rendering for GPU performance with GeekBench Compute and occasionally PugetBench being exceptions.
It sounds like you're saying you can go to any other site and see lots of discussions of GPU rendering benchmarks, so it's nice that you can come here where the focus is on GPU compute. I understand that.

But for me, I find this site has the highest-quality/lowest-noise technical discussions, so this is where I like to go to see things put in their proper context. Thus I'd like to see the site whose technical discussions I prefer discuss both GPU rendering and GPU compute benchmarks with equal emphasis, so that I can see both put into their proper context, if that makes sense.
To be honest, a lot of times people use “synthetic” as a pejorative to mean a benchmark they don’t like.
Perhaps the portion of 3D Mark that can run on AS (Wildlife Extreme) is not synthetic, and does replicate real-world rendering tasks. But I looked into this further and Wildlife Extreme is written for iOS, not MacOS. I don't know what the implications of that are for the values generated by Wildlife Extreme when run on MacOS.
 
It sounds like you're saying you can go to any other site and see lots of discussions of GPU rendering benchmarks, so it's nice that you can come here where the focus is on GPU compute. I understand that.

But for me, I find this site has the highest-quality/lowest-noise technical discussions, so this is where I like to go to see things put in their proper context. Thus I'd like to see the site whose technical discussions I prefer discuss both GPU rendering and GPU compute benchmarks with equal emphasis, so that I can see both put into their proper context, if that makes sense.

Perhaps the portion of 3D Mark that can run on AS (Wildlife Extreme) is not synthetic, and does replicate real-world rendering tasks. But I looked into this further and Wildlife Extreme is written for iOS, not MacOS. I don't know what the implications of that are for the values generated by Wildlife Extreme when run on MacOS.
As far I as I know wrt Wildlife Extreme on iOS vs macOS, few to no implications, but it will probably be slowly replaced by Steel Nomad Light as the go to of 3D Mark’s benches for mobile and integrated GPUs.
 
Does DNE stand for "Does Not Exist"? Why do you think there might be a hole there in the numbering scheme? Are the numbers incrementally allocated based on when they are put into testing or a bit randomly? Do you think based on this we will see Mac Studio before MacBook Air?

What do you base the 30" iMac guess for M5 on?
All good questions. I had rushed through my previous posts so I apologize for the lack of clarity.

Mac16,4 was absent from a strings leak in the 15.1 beta (while all other Mac16 identifiers, Mac17,1, and Mac17,2 were present). I should have been more precise about that, and I inferred (perhaps incorrectly) that Mac16,4 would not be released. As far as I can determine, those identifiers are allocated somewhat randomly, changing from generation to generation.

CPIDs are laid out according to a pretty clear pattern, however the interesting thing is when they become active; currently M5 and M5 Pro are active, but not M4 Ultra, M5 Max, or M5 Ultra. This indicates that upcoming products will include the M5 and M5 Pro before the M5 Max or M5 Ultra (and that the M4 Ultra may not exist). The Mac Studio M4 Max and two MacBook Air M4 models are likely the next releases; no idea which order, but I'd speculate that it'll be December or January. I think that MBA M4 would be a Xmas hit, so maybe there'll be a December surprise.

My guess as to Mac17,1 and Mac17,2 is based on the M5 and M5 Pro CPIDs being active but not other M5 CPIDs. Given that the entire Mac lineup was just updated, I find it unlikely that any of the M4 Pro products will be updated to M5 Pro so soon. The only product that makes sense is a larger iMac, and given the incredible performance of the M4 Pro, I'd expect the M5 Pro to be worthy of the iMac Pro moniker (though who knows if they'll market it that way). Of course, it's also possible that Apple developed an entirely new Mac product that is not a larger iMac, but I struggle to imagine what that could be.
 
Just to follow up on that last bit, I'll spitball some possible novel Mac products. Mac16,4 could be a M4-series blade server that runs a server-specific OS rather than macOS (and thus would have been omitted from the macOS beta). Mac17,1/2 could be M5 or M5 Pro MacProjector, or maybe the mythical TouchBook, or perhaps iMacSpatial with a stereoscopic display. Could be a lot of things, but I went with 30" iMac because it's less unorthodox and easier to defend 🙂
 
3DMark Steel Nomad: Ergebnisse im Überblick - ComputerBase

Looking at this 4090 laptop scores around 23,000
M3 Max scores around 12,000
M4 Max should score around 15200. This is not bad all on a laptop but its very expensive to get the 40 core GPU. The 4070 laptop scores around 13000.

For gaming Nvidia is very much ahead and the RTX 50 series is coming around CES. It also remains to be seen how much AMD will improve with RDNA 4 too.


Now for desktop the M4 Max 40 core will be around the same performance as a 4070 desktop, the M4 Max will used in a $1999 desktop. This is just sad. Don't know why Apple cannot make a GPU thats perf/$ friendly. They got efficiency part right.
Cyberpunk 2077 will show us Apple's RT capablities and I hope the 2x ray tracing improvement in the M4 is evident there otherwise it won't paint a good picture when the benchmarks come out.
 
Now for desktop the M4 Max 40 core will be around the same performance as a 4070 desktop, the M4 Max will used in a $1999 desktop. This is just sad.
Here's an idea. What if the Hidra SoC is a multi-chop (like the single-chop M1/2 Pro/Max), but the M4 Max up to Ultra?

The M4 Max (CPID 0x6041) is already some sort of chop (14/32 384b & 16/40 512b) that uses the same firmware, so perhaps that scales up to three chops (14/32 384b, 16/40 512b, 20/60 768b, 24/80 1024b), the first two configs being for laptops, and the second two for desktops. It fits the Hydra mythos, what with the many heads to chop.

So, even though the SoCs would have the same CPID, maybe the Studio M4 "Max/Ultra" will actually be a lot better and not sad :)
 
Makes sense. But then it seems when we disucss comparative GPU performance, we should also be giving as much attention to GPU rendering benchmarks as to GB6 GPU Compute.

Rendering is an odd one because it requires very specific capabilities that most other software does not use. We are talking raytracing, very complex computer shaders, and non-trivial memory access patterns. I do like Blender as a compute benchmark, one has to keep on mind that the results won’t translate to games, ML, or HPC compute GPUs are often used for.

Overall, you raise some excellent points. GPU performance means different things to different people. Gaming is obviously a big one, but software availability makes it difficult to compare Apple with the rest. Rendering is another topic that interest a lot of people, but there isn’t much out there except Blender that can be used for benchmarking. Other applications are rare and in between. Yes, content creation apps use GPU, but little us know how exactly they use it and there is no good way to compare GPUs performance specifically. With those apps it might make more sense to look at performance holistically. PugetBench does a good job for example.

I don’t think that Geebkench6 is a particularly useful GPU benchmark. They use some common simple shaders, but not much is known about their implementation or real-world applicability. As you said, the fact that different backends show vastly different performance is worrisome. It’s useful to see the progress between generations though.

Don't know why Apple cannot make a GPU thats perf/$ friendly. They got efficiency part right.

They are not interested in that part of the market. Never have been. Production is limited and R&D is expensive. Why would they spend resources to sell cheap GPUs if their customers are happy to pay $4000 for a Max-sized die? And of course, companies like Nvidia have a huge advantage. They can make Max-sized die packed full of GPU compute. Apple needs to build a full system in the same footprint. For a similar die size, Nvidia will always be ahead, unless Apple uses its $$$ to invest into some new tech like die stacking.

Besides, GPU performance is extremely overrated. Nvidia’s and AMD normalized oversized behemoths that draw more power than a water boiler. You don’t need 60 TFLOPs to play games. Sadly, because of this (and the toxic gaming industry) we have lost the precious art of optimization. Base M4 is more than sufficient for running pretty much any game. It’s the code that sucks. Just look at Blender. M3 Max already outperforms 7900 XTX, a GPU that has more than 3x compute capacity! That’s what smart use of technology, compute, and software optimizations can bring.
 
Last edited:
Here's an idea. What if the Hidra SoC is a multi-chop (like the single-chop M1/2 Pro/Max), but the M4 Max up to Ultra?

The M4 Max (CPID 0x6041) is already some sort of chop (14/32 384b & 16/40 512b) that uses the same firmware, so perhaps that scales up to three chops (14/32 384b, 16/40 512b, 20/60 768b, 24/80 1024b), the first two configs being for laptops, and the second two for desktops. It fits the Hydra mythos, what with the many heads to chop.

So, even though the SoCs would have the same CPID, maybe the Studio M4 "Max/Ultra" will actually be a lot better and not sad :)

M4 Max will already be very close to the reticle limit. I doubt that there is an even larger die. Besides, the only thing such die would bring are more GPU cores, not particularly useful or marketable.
 
You've misread my post. My post said that while any app that has a GUI (which is nearly all consumer apps) will need to use the GPU to render that GUI (examples include Word and Excel), only a tiny percentage of consumer apps use the GPU for GPU compute. And it's only these apps that matter when it comes to GPU compute. Which is the same thing you were saying. Take another look:

I.e., Word and Excel were given as examples of typical consumer apps, which do use the GPU, but don't use GPU compute, and that thus do not matter in this discusion.

It's possible I did, but your phrasing left it ambiguous enough that it can be read as implying the scenario was ‘compute light’ rather than ‘compute non-existent’. There’s no real mention that these apps didn’t use compute, just the mentions about compute. I read this differently than you apparently intended it to be.

While GPU compute can be used to do calculations that support rendering, GPU compute and GPU rendering are generally considered to be qualitatively different tasks. Further, the consumer/prosumer uses that most commonly stress GPU's aren't GPU compute tasks, they are GPU rendering tasks* (processing photos and videos, and playing video games). Given that GB6 is supposed to reflect the workloads of people that buy Macs and PC's, shouldn't its GPU benchmark be primarily a GPU rendering benchmark that contains some GPU compute tasks, rather than what it appears to be, which is a GPU compute benchmark that contains some rendering-related tasks?

I see it more like a Venn diagram where different tasks land in different spaces, and some rely on both. Games and 3D rendering rely on both pipelines quite heavily.

But keep in mind, GPGPU came out of the realization that the shader cores manipulating texture data and pixel data could be used for more general processing of data. So when you process image data on a GPU, that’s compute. Shaders used in games are compute. It is very relevant for these tasks, and tends to be the bottleneck more than the raster pipeline.

The raster pipeline right now is really only taxed when faced with complex 3D geometry and even then shaders can become the bottleneck instead. Just less interest in it. GPUs don’t list their pixel fill rate or polygons/sec anymore on the spec sheets for much the same reason.
 
M4 Max will already be very close to the reticle limit. I doubt that there is an even larger die. Besides, the only thing such die would bring are more GPU cores, not particularly useful or marketable.
Hmm, do you know what the N3E reticle size is these days? For some reason I thought TSMC expanded it in the last few months, but I couldn’t find anything useful from a cursory search.

I suppose they could pattern the Hidra “body” and “heads” adjacently, and connect them at the back end with maskless lithography. They could fly-test the wafer and then pattern a final metal layer to connect the working regions. Maskless lithography should easily achieve 500nm wire pitch (I personally have patterned <1um photonic structures with a DMD I repurposed from a cheap 3D printer), which is a great deal better than UltraFusion’s 25um.
 
They are not interested in that part of the market. Never have been. Production is limited and R&D is expensive. Why would they spend resources to sell cheap GPUs if their customers are happy to pay $4000 for a Max-sized die? And of course, companies like Nvidia have a huge advantage. They can make Max-sized die packed full of GPU compute. Apple needs to build a full system in the same footprint. For a similar die size, Nvidia will always be ahead, unless Apple uses its $$$ to invest into some new tech like die stacking.
One way apple may win is due to tight control of the software and abstraction of the hardware into the libraries they more or less force developers into using.

That, and the cross-compatibility (just larger scale) from their phone GPUs through to the Mac Pros. Like microsoft has a monopoly (though decreasing) on desktop operating systems, Apple probably has the largest pool of GPUs for a developer to target with a single consistent API (across mobile, tablet and laptop/desktop) on the planet. It will be really interesting if they can kick start the game industry into writing for the Mac mini as a baseline, as right now that's an awesome baseline platform for not a lot of money that will actually game acceptably with the given spec!

The PC industry has to cater to all manner of different hardware which somewhat limits the ability to support new hardware features or run optimally on any single platform.

I'm waiting for my m4 max to arrive so I can start testing things on it, I'm very interested to see how Cyberpunk 2077 runs on it when released as Apple's announcement specifically mentioned RT and path tracing and not even my 6900XT on the PC can do path tracing.
 
One way apple may win is due to tight control of the software and abstraction of the hardware into the libraries they more or less force developers into using.

Yes, and it already works for certain types of software. Take photo/video editors, these generally work amazingly well on Apple Silicon. The GPU processing they use is fairly simple and Apple's focus on these applications works well.

However, it will never work the same way for games. Software is too complex. And with the gamedev market and Apple's share of it being what it is, nobody is going to spend extensive time optimizing for Apple platforms.
 
Besides, the only thing such die would bring are more GPU cores, not particularly useful or marketable.
Why do you think that only GPU cores could be part of such an augmentation? CPU complexes require substantially less bandwidth, and the M1/2 Ultras had no problem as far as CPU synchronization goes.
 
Here's an idea. What if the Hidra SoC is a multi-chop (like the single-chop M1/2 Pro/Max), but the M4 Max up to Ultra?

The M4 Max (CPID 0x6041) is already some sort of chop (14/32 384b & 16/40 512b) that uses the same firmware, so perhaps that scales up to three chops (14/32 384b, 16/40 512b, 20/60 768b, 24/80 1024b), the first two configs being for laptops, and the second two for desktops. It fits the Hydra mythos, what with the many heads to chop.

So, even though the SoCs would have the same CPID, maybe the Studio M4 "Max/Ultra" will actually be a lot better and not sad :)
I doubt M4 Max 14/32 384b is a chop of 16/40 512b. It should be the same SoC with some cores fused off and some memory channels unpopulated. These are enormous die, so Apple needs some way to enhance yields, and selling parts with defective cores as reduced core count chips is the classic way to do it.

It's also a challenging thing to come up with a layout that lets you simultaneously scale CPU, GPU, and memory controller count with each 'chop', especially when you acocunt for CPU clustering. That's why M1 and M2 both had identical CPU core counts in Pro and Max.
 
I doubt M4 Max 14/32 384b is a chop of 16/40 512b. It should be the same SoC with some cores fused off and some memory channels unpopulated. These are enormous die, so Apple needs some way to enhance yields, and selling parts with defective cores as reduced core count chips is the classic way to do it.

It's also a challenging thing to come up with a layout that lets you simultaneously scale CPU, GPU, and memory controller count with each 'chop', especially when you acocunt for CPU clustering. That's why M1 and M2 both had identical CPU core counts in Pro and Max.
Fair enough, though I was under the impression that N3E had higher yields than N3B, so I don’t think that’s a compelling design constraint.

Also:
I suppose they could pattern the Hidra “body” and “heads” adjacently, and connect them at the back end with maskless lithography. They could fly-test the wafer and then pattern a final metal layer to connect the working regions. Maskless lithography should easily achieve 500nm wire pitch (I personally have patterned <1um photonic structures with a DMD I repurposed from a cheap 3D printer), which is a great deal better than UltraFusion’s 25um.
 
Last edited:
I'm waiting for my m4 max to arrive so I can start testing things on it, I'm very interested to see how Cyberpunk 2077 runs on it when released as Apple's announcement specifically mentioned RT and path tracing and not even my 6900XT on the PC can do path tracing.
The only other GPUs that can do path tracing with acceptable frame rates is the RTX 4070 Ti and higher cards from Nvidia with DLSS. Even RDNA 3 is not enough, supposedly AMD has a big RT uplift coming with RDAN4 in Q1 2025. It will be interesting to see how all GPUs compare.

I think the press also mentioned frame gen, this is likely FSR 3.1.
 
Back
Top