M4 Mac Announcements

NotEntirelyConfused · Dec 21, 2024

dada_dave said:
In general the M4 chips seem to use more power across the board though. And I’ve seen several reviewers note concordant increases in fan noise as a result in some models, but the same or actually less fan noise in others, but more throttling. It seems to depend on what Apple decided for that model.

Right, but other reviewers, or posters in the MR thread who seem to have their heads on straight (plenty of participants did not), did not note corresponding levels of fan noise. There seems to be a large difference in the experiences people are reporting. This is what I was talking about - some people are not reporting significant fan noise even when the fan is running at higher speeds.

dada_dave said:
There are also those 3rd party tools to control fan speed. That’s another solution to those bothered by the fan noise (as long as it isn’t coil whine). TG Pro? I think there others too.

Right, and some people in that MR thread have used those tools successfully. Others just... whined.

theorist9 said:
When it comes to noise, case size is critical. With a larger case and attendantly larger heatsinks you can dissipate thermal energy less noisily because you can get higher airflow at lower air velocity. And you can use bigger, slower-moving fans.

Case size is only critical if it's relevant - that is, if it's actually a factor, because it's being used as a heat sink and radiator, and because a larger fan is being used, taking advantage of that larger case.

It's not clear to me that any of those are true in the case of the old Mini. That is, the fan in the old Mini was not as large as the entire case, and in fact may not be any larger than the fan in the new mini- that's something I don't know, and nobody reported exact sizes in that MR thread. Does anyone here know? I also don't have any useful info on how effective the old Mini was at using the case as a heat sink and radiator.

theorist9 said:
Also, FWIW, the standard return window is 2 weeks (with the holidays they have until Jan 8

Right, that's what I was talking about - all the pissing and moaning was from people who, at that time, would have had two months or more to return the new Mini since holiday return rules were already in effect.

Jimmyjames · Dec 21, 2024

To add more noise to the data, I purchased an M4 Pro Mini. 14c cpu, 20c gpu 64GB and I can say my experience is that it is silent most of the time. If I use Cinebench to test the cpu, then after a minute or two the fans become audible. I measured using my Apple Watch from right next to it and it was about 48/49db. From a foot or two away it was around 43db. Only the cpu running at 50% or higher will incur any kind of audible fan noise. The gpu doesn’t seem to cause it to spin up at all. Playing games or doing gpu benchmarks hasn’t yielded noise at all.

PS. It’s awesome. I am coming from a 2018 Mini and the difference is staggering.

Citysnaps · Dec 21, 2024

Jimmyjames said:
To add more noise to the data, I purchased an M4 Pro Mini. 14c cpu, 20c gpu 64GB and I can say my experience is that it is silent most of the time. If I use Cinebench to test the cpu, then after a minute or two the fans become audible. I measured using my Apple Watch from right next to it and it was about 48/49db. From a foot or two away it was around 43db. Only the cpu running at 50% or higher will incur any kind of audible fan noise. The gpu doesn’t seem to cause it to spin up at all. Playing games or doing gpu benchmarks hasn’t yielded noise at all.

PS. It’s awesome. I am coming from a 2018 Mini and the difference is staggering.

Thanks for the heads up - appreciate it. I'm considering an M4 Mini for my 8 outdoor security video cameras and home automation that has been running on a 2019 Intel i7 Mini 24/7 for the last five years, and produces a lot of heat. Will be nice going with an M4 Mini knocking that (and electric bill) down, and also processing the video streams at higher frame rates.

I'm also considering an M4 Pro Mini for a dedicated X-Plane simulator computer that drives three large displays.

mr_roboto · Dec 21, 2024

NotEntirelyConfused said:
It's not clear to me that any of those are true in the case of the old Mini. That is, the fan in the old Mini was not as large as the entire case, and in fact may not be any larger than the fan in the new mini- that's something I don't know, and nobody reported exact sizes in that MR thread. Does anyone here know?

I got curious, so I decided to find out. Apple posts images that can be used for this as part of their repair manuals. I used images showing each mini in its mostly assembled state with the fan visible. This let me use the overall case width (a posted spec) to calibrate my pixels per inch scale for each image.

I compared the plain-M4 mini to the M2 Pro mini. Plain M2 mini has a smaller fan than M2 Pro. Maybe should've used the M4 Pro mini, but I don't think the fan size differs between M4 and M4 Pro, only difference is whether the heatsink uses copper fins and a bigger heatpipe.

Results (M4, then M2 Pro):
Fan body: 3.3", 3.1"
Fan air intake diameter: 1.6", 1.58"
Fan hub/motor diameter: 1", 0.85"

So, about the same size. If anything, the M4 fan is slightly bigger.

On the other hand, the airflow path in the M2 mini is a bit simpler.

NotEntirelyConfused said:
I also don't have any useful info on how effective the old Mini was at using the case as a heat sink and radiator.

No reason to think either is effective at this at all, because in both cases the aluminum shell is airgapped from the SoC and SoC heatsink.

NotEntirelyConfused · Dec 21, 2024

Thanks, that's very useful!!

mr_roboto said:
No reason to think either is effective at this at all, because in both cases the aluminum shell is airgapped from the SoC and SoC heatsink.

Right, but that means the case will sink and radiate heat based on what transfers from ambient air. That's not nothing. But is it significant? Will the larger case make a meaningful difference? I don't have good instincts for this, but my guess is no.

mr_roboto · Dec 21, 2024

NotEntirelyConfused said:
Right, but that means the case will sink and radiate heat based on what transfers from ambient air. That's not nothing. But is it significant? Will the larger case make a meaningful difference? I don't have good instincts for this, but my guess is no.

It shouldn't be a significant effect.

Heat flow can be modelled with equations that are essentially the same as simple Ohm's law calculations, just with slightly different units. The motivating force that creates currents is temperature rather than voltage, and the current unit is watts (joules per second) instead of amps (coulombs per second).

Every material (or interface between materials) has a characteristic thermal resistance (units: degrees kelvin per watt). This means you can draw up thermal circuits modeling all the paths that heat can flow in. Just like electric current flow through resistors, heat flow through a path is proportional to delta-T divided by resistance (delta-T being the temperature difference across the ends of the path being analyzed). Lower resistance implies higher flow.

SoC -> heatsink -> forced airflow -> vented to atmosphere is a path that's engineered to be very low resistance. SoC -> (a bunch of paths) -> mostly motionless internal air -> aluminum shell -> atmosphere is a much higher resistance path, with most of that resistance being the airgap (motionless air is a good insulator, this is why double paned windows are so much more energy efficient than single paned). Most heat generated by the SoC will be rejected to atmosphere through the low resistance path.

KingOfPain · Dec 21, 2024

Jimmyjames said:
To add more noise to the data, I purchased an M4 Pro Mini. 14c cpu, 20c gpu 64GB and I can say my experience is that it is silent most of the time.
(…)
PS. It’s awesome. I am coming from a 2018 Mini and the difference is staggering.

It seems we got the same model (48 GB RAM probably would have been more than enough for me, but I also went for 64 GB; so far I believe I only managed to fill half the RAM and I didn‘t even see any compression yet). I almost posted here when I ordered it (I received it on Friday 13th), but the thread had strayed so far from this topic, I thought it wouldn‘t be interesting anymore.
I switched from a MacBook Air M1 and the difference is noticeable during normal use. For you it must be a much bigger step up.

I can also confirm that it is practically silent.
I believe I heard something when I pushed the M4 Pro to 100°C while testing Xbox emulation, but that could also have been my heating. It definitely is not the sound of laptop fans spinning at full speed that some reviews described.

The company I work for is paying a native English teacher to improve our English once per week.
This week I helped said teacher to reactivate his Apple ID (he thought he didn‘t have one) to get delivery updates after he ordered the basic Mac mini (16 GB, 256 GB SSD). Since he also switched from older Intel models, he‘s also impressed.

dada_dave · Dec 26, 2024

dada_dave said:
And a second silly mistake of the exact same type as the Edit above because once again I can’t count cores correctly (I must be an LLM), the HX370 has 12 cores which makes the upcoming 16 core Strix Halo 33% bigger not 50%. Still should be able to hit M4 Pro levels in CB R24, but unless it follows the Desktop chip design on N4X rather than the mobile design on N4P (in which case it’ll use a lot of power), I doubt it’ll hit anywhere close to the Max. The TDP is rumored to be 120W (which means more than that in practice and I’m presuming that’s all usable for the CPU, ie the considerable GPU is separate) which is a fair bit more than the official 54W TDP of Strix Point (HX 370, which actually uses more than ~~80W~~ 70W at full power*).

*Notebookcheck used a version of the HX 370 that was allowed to be overclocked past the recommended TDP range by AMD to "65W" and "80W" which actually used close to 90W (rightmost AMD HX 370 point in quoted graph) and over 100W (not pictured) respectively for little to no gain in performance. At "54W" TDP (74W) it gets 1166, at "65W" (88W) it gets about 1200, at "80W" (109W) it still only gets 1216. Basically going from AMD's recommended TDP settings of "54W" (74W) to "80W" (109W), the chip burns ~50% more power for only 4% more performance and almost all of that is gained before it hits "65W" 88W. You can see why AMD lists the Strix Point HX 370's max TDP as "54W".

The Strix Halo with 2 CCDs and no 5c cores is definitely sound more like the desktop version than the Strix Point. No word on which node it’s manufactured on yet. But if the CPU is closer to the desktop Zen 5 than AMD’s previous mobile processor, then indeed it is likely to hit the M4 Max’s performance level, though again at significantly higher power levels.

Various reports put the GPU at a 4060/4060Ti level which, depending on the application, could be similar to or weaker than the Max’s GPU.

dada_dave · Jan 3, 2025

Massive post of mine at MR in reply to a guy who was repeating a lot of the usual tropes about Apple advancements slowing down, maybe it was all just node advantage, etc ... but I hope my reply is much more interesting beyond that.

M4+ Chip Generation - Speculation Megathread [MERGED]

Obviously the foundry node and ARM architecture are important - no one would disagree (well maybe Qualcomm's lawyers for the latter part but that's a separate issue). Heck even Apple's choice of 16Kb page sizes for its OS is helpful because it helps enable larger L1 caches while maintaining a...

forums.macrumors.com

Since I worked really hard it, I'd like to say though that it is still worth your time as I have a huge amount of analysis with some new and updated charts (as well as an older one) with a focus of comparing Zen 5 and Zen 3 versus the M1 and M4 (and a little Lunar Lake/Qualcomm in there too). One key takeaway is that Zen 5 is still struggling to catch up to M1's ICP in most GB subtests which is an incredible result. I also demonstrate that Apple's ratio from the M4 to the the M1 may look worse than AMD's ratio from Zen 5 to Zen 3 but a smaller ratio of a bigger number is still competitive in absolute performance per clock with a larger ratio of smaller number (basically in units of absolute performance, even normalized per clock Apple is improving as much as AMD - there is little evidence of a slowdown). And finally, Apple has, almost entirely, improved its raw ST performance relative to the fastest AMD desktop chip of a comparable generation - i.e. the ratios of the M4/Zen 5 subtest scores are almost always as good as if not better than M1/Zen 3 (only one exception).

thenewperson · Jan 4, 2025

dada_dave said:
Massive post of mine at MR in reply to a guy who was repeating a lot of the usual tropes about Apple advancements slowing down, maybe it was all just node advantage, etc ... but I hope my reply is much more interesting beyond that.

M4+ Chip Generation - Speculation Megathread [MERGED]

Obviously the foundry node and ARM architecture are important - no one would disagree (well maybe Qualcomm's lawyers for the latter part but that's a separate issue). Heck even Apple's choice of 16Kb page sizes for its OS is helpful because it helps enable larger L1 caches while maintaining a...

forums.macrumors.com

Since I worked really hard it, I'd like to say though that it is still worth your time as I have a huge amount of analysis with some new and updated charts (as well as an older one) with a focus of comparing Zen 5 and Zen 3 versus the M1 and M4 (and a little Lunar Lake/Qualcomm in there too). One key takeaway is that Zen 5 is still struggling to catch up to M1's ICP in most GB subtests which is an incredible result. I also demonstrate that Apple's ratio from the M4 to the the M1 may look worse than AMD's ratio from Zen 5 to Zen 3 but a smaller ratio of a bigger number is still competitive in absolute performance per clock with a larger ratio of smaller number (basically in units of absolute performance, even normalized per clock Apple is improving as much as AMD - there is little evidence of a slowdown). And finally, Apple has, almost entirely, improved its raw ST performance relative to the fastest AMD desktop chip of a comparable generation - i.e. the ratios of the M4/Zen 5 subtest scores are almost always as good as if not better than M3/Zen 3 (only one exception).

Very signature AT forums move to reduce everything to IPC gains and make the improvements seen just about node improvements. Bonus points to that guy for the 'Apple users think Apple is doing magic on their chips', always fun to see that trotted out.

B01L · Jan 4, 2025

Apple needs to use the NnX (N3X, N2X) processes for high-end desktop chips, SoCs, chiplets, tiling, whatever; pump up the power, pump up those clock speeds, pump up the performance...! ;^p

leman · Jan 4, 2025

thenewperson said:
Very signature AT forums move to reduce everything to IPC gains and make the improvements seen just about node improvements. Bonus points to that guy for the 'Apple users think Apple is doing magic on their chips', always fun to see that trotted out.

It seems quite indicative of x86-focused view of the world. For many years, major performance improvements in x86 came from clock increases, so IPC became the measure of true progress. And this makes sense, of course, only these individuals often neglect to mention the diminishing returns in IPC. It is "easy" to improve your IPC is was already in the gutter, not so easy if you are leading the market by at least 20-30% in IPC.

This does of course beg the question what is next for Apple. We did see an increase in clock of 40% since M1 (AMD for example only increased the clocks by 20%, although the absolute delta is similar at ~1ghz). Frankly, if Apple manages to keep the low power consumption while increasing the clock, good for them. Still seems like we are quickly approaching a wall.

MacPoulet · Jan 4, 2025

dada_dave said:
Massive post of mine at MR in reply to a guy who was repeating a lot of the usual tropes about Apple advancements slowing down, maybe it was all just node advantage, etc ... but I hope my reply is much more interesting beyond that.

M4+ Chip Generation - Speculation Megathread [MERGED]

Obviously the foundry node and ARM architecture are important - no one would disagree (well maybe Qualcomm's lawyers for the latter part but that's a separate issue). Heck even Apple's choice of 16Kb page sizes for its OS is helpful because it helps enable larger L1 caches while maintaining a...

forums.macrumors.com

Since I worked really hard it, I'd like to say though that it is still worth your time as I have a huge amount of analysis with some new and updated charts (as well as an older one) with a focus of comparing Zen 5 and Zen 3 versus the M1 and M4 (and a little Lunar Lake/Qualcomm in there too). One key takeaway is that Zen 5 is still struggling to catch up to M1's ICP in most GB subtests which is an incredible result. I also demonstrate that Apple's ratio from the M4 to the the M1 may look worse than AMD's ratio from Zen 5 to Zen 3 but a smaller ratio of a bigger number is still competitive in absolute performance per clock with a larger ratio of smaller number (basically in units of absolute performance, even normalized per clock Apple is improving as much as AMD - there is little evidence of a slowdown). And finally, Apple has, almost entirely, improved its raw ST performance relative to the fastest AMD desktop chip of a comparable generation - i.e. the ratios of the M4/Zen 5 subtest scores are almost always as good as if not better than M1/Zen 3 (only one exception).

That was you?! Ah, of course I never made the connection between crazy-dave and dada-dave…

It was quite the read this morning!

B01L said:
Apple needs to use the NnX (N3X, N2X) processes for high-end desktop chips, SoCs, chiplets, tiling, whatever; pump up the power, pump up those clock speeds, pump up the performance...! ;^p

Pump up the jam.

dada_dave · Jan 4, 2025

MacPoulet said:
That was you?! Ah, of course I never made the connection between crazy-dave and dada-dave…

It was quite the read this morning!

Pump up the jam.

Thanks! Unfortunately MacRumors doesn’t make it easy to change names (or at least that used to be their policy I haven’t checked recently).

Cmaier · Jan 4, 2025

dada_dave said:
Thanks! Unfortunately MacRumors doesn’t make it easy to change names (or at least that used to be their policy I haven’t checked recently).

change your name to cmaier. I dare you.

B01L · Jan 4, 2025

MacPoulet said:
Pump up the jam.

Hard Harry says "Pump Up The Volume..."

dada_dave · Jan 28, 2025

Inspired by conversations with @leman and OptimusGrime at the other place I took a deeper look into Blender benchmarks for Apple Silicon:

	Monster	Junkshop	Classroom	Total Score	Bandwidth GB/s	FP32 TFLOPS
M4 Max (40 Core)	2462.07638375756	1322.10820108297	1302.27569870296	5086.46028354349	546	15.5
M4 Max (32 Core)	2069.45050595834	1207.0921655042	1067.13432988062	4343.67700134316	410	12.44
M4 Pro (20 Core)	1212.1372188498	622.664482836412	655.990101194567	2490.79180288078	273	7.78
M4 Pro (16 Core)	1110.11827782035	655.284051736463	579.166581942269	2344.56891149908	273	6.22
M4 (10 Core)	524.36837536322	236.747660119091	296.818109862276	1057.93414534459	120	3.89
M3 Max (40 Core)	2006.5650469189	1048.99339927041	1064.5654517731	4120.12389796241	409.6	14.1
M3 Max (30 Core)	1609.33848662039	951.054299518103	829.451646599097	3389.84443273759	307.2	10.6
M3 Pro (18 Core)	873.526014019736	422.836002956528	438.089215474811	1734.45123245108	153.6	6.4
M3 Pro (14 Core)	781.586407609112	399.948252134939	413.590422200261	1595.12508194431	153.6	4.98
M3 (10 Core)	443.621508494821	212.386703520502	241.873586551229	897.881798566552	102.4	3.5

These are relatively close to Blender's median values. Data here: https://opendata.blender.org/download/. I also took a look at Nvidia cards as well - however, it is not shown because after playing with the data set and similar user-generated data sets, I'm fairly convinced that exact numerical analysis using stock Bandwidth/TFLOPS would be erroneous as so many of the Nvidia cards in the data will be overclocked variants. A lot of people who submit benchmarks are going to be people who built their own systems or bought premium ones and AIB for desktop dGPUs and OEMs for laptop GPUs love to overclock their offerings to differentiate themselves both from each other and Nvidia's FE models as well as to give a reason why their charge more than MSRP. Since Blender doesn't record GPU core/memory clocks, it is impossible to weed those out and the median likely reflects their presence. Side note: this means Apple likely does even better than one might think in Blender benchmarks versus Nvidia comparing their relative stock numbers against their observed performance.

Let's take a look at score per TFLOPS (click to expand):

We can see visually that Classroom shows the most stable performance behavior across Apple Silicon (Monster is almost as stable, but all the numbers are bigger, making it look more variable), it and Junkshop vie for the most demanding while Monster is clearly the least demanding. Junkshop would appear to be the more sensitive to bandwidth (backed up by the Nvidia data but see above for caveats on that), but there are a lot of oddities with Junkshop. Especially with performance actually going down, not just normalized performance, for Junkshop moving between M4 Pro 16 cores and 20 cores. I checked this against multiple data entries and this was relatively consistent - the full data might tell a different story but at the very least it's no better. Overall the biggest jump in performance across all scenarios is from the base to binned Pro for both the M3 and M4. Further, you can see that, especially for Monster and Junkshop (and especially Junkshop), the binned models of the Max and Pro do much better per TFLOP than the full ones. To some extent this is expected for the Pro models as the full chips don't get any extra bandwidth to go with their extra compute, but it also holds true for the Max chips too which absolutely do get extra bandwidth, quite a percentage increase too! So I'm not sure what to make of that. Further I was expecting a bigger uplift in M4 relative to M3 given the new ray tracers, but that doesn't really show up in this data. Most of the improvement in perf/TFLOPS to my eye looks explainable by increases in bandwidth rather than newfangled ray tracers.

Anyway what do you all make of it? Why is the binned Max better per FLOPS than the full Max? What is going on with Junkshop? and why might we not have seen a bigger M3 to M4 uplift per FLOPS given the new ray tracers?

MAJOR EDIT: Screwed up TFLOPS for the M4's, plugged in wrong clockspeed. The M4s do a little better now relative to the M3s in terms of performance/TFLOPS. I still contend that performance improvements appear largely bandwidth driven. Take the base M4 vs M3, it's Bandwidth to TFLOPS ratio improved by about 5% and the score/TFLOPS improvements in Monster, Junkshop, and Classroom are 7%, 0%, and 10% respectively . Meanwhile the M4 vs M4 Max 40 core the BW/TFLOPS shows 21% improvement and the uplift in the score/TFLOPS for Monster, Junkshop, and Classroom are 11%, 14%, and 11%. While I don't expect performance to improve linearly with bandwidth, there isn't a lot of room here for saying new ray tracing cores are having a large effect on the final score.

Jimmyjames · Jan 28, 2025

dada_dave said:
Inspired by conversations with @leman and OptimusGrime

I like that guy. Even if his python isn’t good enough to share!

dada_dave said:
at the other place I took a deeper look into Blender benchmarks for Apple Silicon:

Monster Junkshop Classroom Total Score Bandwidth GB/s FP32 TFLOPS
M4 Max (40 Core) 2462.07638375756 1322.10820108297 1302.27569870296 5086.46028354349 546 15.5
M4 Max (32 Core) 2069.45050595834 1207.0921655042 1067.13432988062 4343.67700134316 410
12.44
M4 Pro (20 Core) 1212.1372188498 622.664482836412 655.990101194567 2490.79180288078 273 7.78
M4 Pro (16 Core) 1110.11827782035 655.284051736463 579.166581942269 2344.56891149908 273 6.22
M4 (10 Core) 524.36837536322 236.747660119091 296.818109862276 1057.93414534459 120 3.89
M3 Max (40 Core) 2006.5650469189 1048.99339927041 1064.5654517731 4120.12389796241 409.6 14.1
M3 Max (30 Core) 1609.33848662039 951.054299518103 829.451646599097 3389.84443273759 307.2 10.6
M3 Pro (18 Core) 873.526014019736 422.836002956528 438.089215474811 1734.45123245108 153.6 6.4
M3 Pro (14 Core) 781.586407609112 399.948252134939 413.590422200261 1595.12508194431 153.6 4.98
M3 (10 Core) 443.621508494821 212.386703520502 241.873586551229 897.881798566552 102.4 3.5

These are relatively close to Blender's median values. Data here: https://opendata.blender.org/download/. I also took a look at Nvidia cards as well - however, it is not shown because after playing with the data set and similar user-generated data sets, I'm fairly convinced that exact numerical analysis using stock Bandwidth/TFLOPS would be erroneous as so many of the Nvidia cards in the data will be overclocked variants. A lot of people who submit benchmarks are going to be people who built their own systems or bought premium ones and AIB for desktop dGPUs and OEMs for laptop GPUs love to overclock their offerings to differentiate themselves both from each other and Nvidia's FE models as well as to give a reason why their charge more than MSRP. Since Blender doesn't record GPU core/memory clocks, it is impossible to weed those out and the median likely reflects their presence. Side note: this means Apple likely does even better than one might think in Blender benchmarks versus Nvidia comparing their relative stock numbers against their observed performance.

Let's take a look at score per TFLOPS (click to expand):

View attachment 33611

We can see visually that Classroom shows the most stable performance behavior across Apple Silicon (Monster is almost as stable, but all the numbers are bigger, making it look more variable), it and Junkshop vie for the most demanding while Monster is clearly the least demanding. Junkshop would appear to be the more sensitive to bandwidth (backed up by the Nvidia data but see above for caveats on that), but there are a lot of oddities with Junkshop. Especially with performance actually going down, not just normalized performance, for Junkshop moving between M4 Pro 16 cores and 20 cores. I checked this against multiple data entries and this was relatively consistent - the full data might tell a different story but at the very least it's no better. Overall the biggest jump in performance across all scenarios is from the base to binned Pro for both the M3 and M4. Further, you can see that, especially for Monster and Junkshop (and especially Junkshop), the binned models of the Max and Pro do much better per TFLOP than the full ones. To some extent this is expected for the Pro models as the full chips don't get any extra bandwidth to go with their extra compute, but it also holds true for the Max chips too which absolutely do get extra bandwidth, quite a percentage increase too! So I'm not sure what to make of that. Further I was expecting a bigger uplift in M4 relative to M3 given the new ray tracers, but that doesn't really show up in this data. Most of the improvement in perf/TFLOPS to my eye looks explainable by increases in bandwidth rather than newfangled ray tracers.

Anyway what do you all make of it? Why is the binned Max better per FLOPS than the full Max? What is going on with Junkshop? and why might we not have seen a bigger M3 to M4 uplift per FLOPS given the new ray tracers?

MAJOR EDIT: Screwed up TFLOPS for the M4's, plugged in wrong clockspeed. The M4s do a little better now relative to the M3s in terms of performance/TFLOPS. I still contend that performance improvements appear largely bandwidth driven. Take the base M4 vs M3, it's Bandwidth to TFLOPS ratio improved by about 5% and the score/TFLOPS improvements in Monster, Junkshop, and Classroom are 7%, 0%, and 10% respectively . Meanwhile the M4 vs M4 Max 40 core the BW/TFLOPS shows 21% improvement and the uplift in the score/TFLOPS for Monster, Junkshop, and Classroom are 11%, 14%, and 11%. While I don't expect performance to improve linearly with bandwidth, there isn't a lot of room here for saying new ray tracing cores are having a large effect on the final score.

I don’t know. It’s an interesting question. I recall asking a similar question when the M4 first came out. Apple advertises 2x RT perf, but I don’t know the extent to which Blender Benchmark is purely a RT test. I recall Geekerwan had their own test which did show a significant RT uplift for M4 vs M3.

I think you are probably correct that bandwidth is the most likely reason for the discrepancies. When comparing M4 vs M3 scores, The M4 Pro has a bigger uplift vs the M3 Pro when compared to the M4 Max vs M3 Max. The M4 Pro also has a larger memory bandwidth increase M4 = 273GB/s - M3 = 150GB/s. A 82% increase. The M4 Max increased its bandwidth by 34%. It may be that the RT cores can consume larger amounts of bandwidth than the chips can currently provide. Total guess though.

dada_dave · Jan 28, 2025

Jimmyjames said:
I like that guy. Even if his python isn’t good enough to share!

I don’t know. It’s an interesting question. I recall asking a similar question when the M4 first came out. Apple advertises 2x RT perf, but I don’t know the extent to which Blender Benchmark is purely a RT test. I recall Geekerwan had their own test which did show a significant RT uplift for M4 vs M3.

I think you are probably correct that bandwidth is the most likely reason for the discrepancies. When comparing M4 vs M3 scores, The M4 Pro has a bigger uplift vs the M3 Pro when compared to the M4 Max vs M3 Max. The M4 Pro also has a larger memory bandwidth increase M4 = 273GB/s - M3 = 150GB/s. A 82% increase. The M4 Max increased its bandwidth by 34%. It may be that the RT cores can consume larger amounts of bandwidth than the chips can currently provide. Total guess though.

Indeed I remember you did. And I think Solar Bay shows similar uplift in ray tracing performance:

According to this the 18 core M3 pro scores around 22,460. Another got over 21,000. The M3 in the Macbook Pro seems to score around 13,500-13,700 while the M4 scored 16828. The 13,500 is from Notebookcheck who didn't test the M4 in the MacBook Pro but did do so for the M4 Pro 16/20-core variants score 27,422 and 30,730. Unfortunately no one as far as I can tell tested the M4 Max. With respect to increases in performance per TFLOPS in the M4 generation over the M3, it's about 10-12% by my calculations.

dada_dave · Jan 29, 2025

dada_dave said:
MAJOR EDIT: Screwed up TFLOPS for the M4's, plugged in wrong clockspeed.

Sigh ... I might've been right the first time ... or at least closer. It's also possible that the M4 Max has a different clock speed from the base M4 and I'm not sure about the Pro. Geekerwan reported that the base M4 had a just less than 10% increase in clocks (roughly 1.5GHz). But that was in the iPad Pro I think, so it's also possible it's higher in the mini and the Pro. I notice in their M4 Max video, asitop reporting 1.577GHz, which would put the Max at what I had originally at 16 TFLOPS. Not sure about the others.

M4 Mac Announcements

Power User

Elite Member

Elite Member

Site Champ

Power User

Site Champ

Site Champ

Elite Member

Elite Member

Power User

SlackMaster

Elite Member

Power User

Elite Member

Site Master

SlackMaster

Elite Member

Elite Member

Elite Member

Elite Member

Similar threads