M3 core counts and performance

Yes I’m pretty sure the setting allows you to turn MetalRT on or off.
The benchmark tool doesn't give you the option unfortunately.
I had a poke around to check if accepts any command line flags but that doesn't appear to be the case. The only options are blender version and render device (just CPU or GPU)
If you know a workaround let me know 👍

Side note, I've been using the machine constantly since I ran the blender tests earlier (about 3.5 hours, mostly YouTube and browsing). IIRC, the battery was at around 93% when I finished the blender tests. The battery is at 80% now and iStats remaining time estimate is still 20 hours. Freakin' love this thing!

I think PCMag reported 30 hours of 720p offline playback? Well, I think 30 hours of 4K YouTube could be doable (maybe a perk of the AV1 decoder, not sure)

Purely anecdotal, but my feeling (so far) is M3 Pro makes an appreciable difference to light load efficiency vs. prior M SoCs. I’m not hypermiling - I’m deliberately not cleaning up after myself (leaving lots of background apps etc. running) and it just isn’t phased. Maybe some of it is thanks to software improvements not specific to this hardware (e.g. I noticed the scheduler and/or VMWare and/or virtualisation framework is making better choices with VMs, like placing Windows on the six E-cores while it runs updates instead of always on the P cores).

Just leaving powermetrics polling in the background, the average SoC power draw seems lower than I’m used to with my M1 work Mac. I would need to run a controlled test to be sure, but I could swear M1 typically averages out higher.
Like, just writing this message, the SoC hasn’t exceeded 350mW (with Safari, Music, Telegram, Ivory, Terminal, Sublime Text, Messages, Notes, iStat and all the system services like Time Machine, iCloud daemons etc. still doing things silently behind the scenes, not everything is “App Nap”’d). That’s nuts. M1 would typically at least bounce up to 1-5W from time to time, but this isn’t.

Wish I had the time to create some kind of automated tests to evaluate these questions properly 😫
 

Attachments

  • Screenshot 2023-11-15 at 22.51.47.png
    Screenshot 2023-11-15 at 22.51.47.png
    569.8 KB · Views: 30
  • Screenshot 2023-11-15 at 22.51.54.png
    Screenshot 2023-11-15 at 22.51.54.png
    574.7 KB · Views: 29
The benchmark tool doesn't give you the option unfortunately.
I had a poke around to check if accepts any command line flags but that doesn't appear to be the case. The only options are blender version and render device (just CPU or GPU)
If you know a workaround let me know 👍

Side note, I've been using the machine constantly since I ran the blender tests earlier (about 3.5 hours, mostly YouTube and browsing). IIRC, the battery was at around 93% when I finished the blender tests. The battery is at 80% now and iStats remaining time estimate is still 20 hours. Freakin' love this thing!

I think PCMag reported 30 hours of 720p offline playback? Well, I think 30 hours of 4K YouTube could be doable (maybe a perk of the AV1 decoder, not sure)

Purely anecdotal, but my feeling (so far) is M3 Pro makes an appreciable difference to light load efficiency vs. prior M SoCs. I’m not hypermiling - I’m deliberately not cleaning up after myself (leaving lots of background apps etc. running) and it just isn’t phased. Maybe some of it is thanks to software improvements not specific to this hardware (e.g. I noticed the scheduler and/or VMWare and/or virtualisation framework is making better choices with VMs, like placing Windows on the six E-cores while it runs updates instead of always on the P cores).

Just leaving powermetrics polling in the background, the average SoC power draw seems lower than I’m used to with my M1 work Mac. I would need to run a controlled test to be sure, but I could swear M1 typically averages out higher.
Like, just writing this message, the SoC hasn’t exceeded 350mW (with Safari, Music, Telegram, Ivory, Terminal, Sublime Text, Messages, Notes, iStat and all the system services like Time Machine, iCloud daemons etc. still doing things silently behind the scenes, not everything is “App Nap”’d). That’s nuts. M1 would typically at least bounce up to 1-5W from time to time, but this isn’t.

Wish I had the time to create some kind of automated tests to evaluate these questions properly 😫
Huh. I thought if you turned off MetalRT within Blender itself, it would honour those settings within the benchmark. Perhaps that’s wrong.
 
Huh. I thought if you turned off MetalRT within Blender itself, it would honour those settings within the benchmark. Perhaps that’s wrong.
Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender 🙂
 
Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender 🙂
Thanks for trying. I’m not especially experienced with Blender myself. It must be possible I imagine given there are results for cpu as well as gpu. For nvidia there are results for cuda and optix on the same gpu.
 
Thanks for trying. I’m not especially experienced with Blender myself. It must be possible I imagine given there are results for cpu as well as gpu. For nvidia there are results for cuda and optix on the same gpu.
I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0 😅
 
I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0 😅
No problem. Thank you for your efforts!
 
I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0 😅
If this ask is too much, don't hesitate to say no, but you could check some of this using the Xcode profiling tools - like RT cores utilization, memory bandwidth being used, etc ...

 
Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender 🙂
You can download the “offline” (i.e. it can’t upload the results) python benchmark script from https://opendata.blender.org/about/ and you can use any blender settings you please :)
 
Looking back at the RAM discussion: One interesting thing to consider is that Apple's starting memory for its signature consumer laptop, the Air, hasn't changed in six years (all 2017 models came with 8 GB). And its starting memory for its prosumer MBP's (ignoring the 13" Intel versions, and the base M# AS versions) hasn't changed in nine (all versions of the mid-2014 15" MBP came with 16 GB).

The difference is those were the only memory choices—thus they were both the starting and max RAM—while today you have the option to get more. So Apple's position is effectively that the minimum you need hasn't changed in many years, while the maximum may have.
 
Last edited:
You can download the “offline” (i.e. it can’t upload the results) python benchmark script from https://opendata.blender.org/about/ and you can use any blender settings you please :)
Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:
Code:
/Applications/Blender.app/Contents/MacOS/Blender --background \
  --factory-startup \
  -noaudio \
  --debug-cycles \
  --enable-autoexec \
  --engine \
  CYCLES \
  ~/Benchmarks/junkshop/main.blend \
  --python \
  main.py \
  -- \
  --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here 😜)
 

Attachments

  • script_metalrt_on.png
    script_metalrt_on.png
    493.6 KB · Views: 31
  • script_metalrt_off.png
    script_metalrt_off.png
    487.6 KB · Views: 34
  • blender_rton.png
    blender_rton.png
    732.4 KB · Views: 34
  • blender_rtoff.png
    blender_rtoff.png
    666.1 KB · Views: 33
If this ask is too much, don't hesitate to say no, but you could check some of this using the Xcode profiling tools - like RT cores utilization, memory bandwidth being used, etc ...

Ah good call, I'll give that a look at the weekend
 
Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:
Code:
/Applications/Blender.app/Contents/MacOS/Blender --background \
  --factory-startup \
  -noaudio \
  --debug-cycles \
  --enable-autoexec \
  --engine \
  CYCLES \
  ~/Benchmarks/junkshop/main.blend \
  --python \
  main.py \
  -- \
  --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here 😜)
That looks like a very nice improvement in performance with RT on. Time nearly halved? Very impressive.
 
Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:
Code:
/Applications/Blender.app/Contents/MacOS/Blender --background \
  --factory-startup \
  -noaudio \
  --debug-cycles \
  --enable-autoexec \
  --engine \
  CYCLES \
  ~/Benchmarks/junkshop/main.blend \
  --python \
  main.py \
  -- \
  --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here 😜)
Cinebench itself uses 7GB RAM. From what I saw online the 8GB model couldn't even start the blender benchmark.

8GB on Pro Macs needs to die.
 
(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here 😜)

Cinebench itself uses 7GB RAM. From what I saw online the 8GB model couldn't even start the blender benchmark.

8GB on Pro Macs needs to die.

I would seriously love to see “ultra” benchmarks from these companies with larger scenes, 15-30GB, that takes a lot longer to render, would be, from my understanding, more indicative of larger workloads, and naturally would showcase GPUs that had access to large amounts of memory 🙃. I do wonder how the M3 Max or Pro would fair against their competitors then.

Hell even higher steps for testing even bigger scenes.
 
I may have missed it if mentioned, but can anyone confirm the power usage of the 40 core Max gpu?
According to this article by Anton Avdyushkin (translated by Andreas Osthoff) at notebookcheck.net ( https://www.notebookcheck.net/Apple...allenges-HX-CPUs-from-AMD-Intel.766414.0.html ), the max power consumption of the M3 Max's 40-core GPU is 60 W.

One of the nice things about the following comparative assessment is that they included power consumption figures for both the M3 GPU and various NVIDIA 4000-series Laptop GPUs. In particular, they found a 4070 Laptop GPU configured for the same 60 W power consumption as the M3 Max's.

GPU Performance
The high-end M3 Max chip is equipped with a 40-core GPU, which means there are two additional cores compared to the previous 38-core unit. We also suspect the core clock is higher. The performance is between 13-25 % higher depending on the test, but the maximum power consumption dropped from 64 to 60 Watts. The big advantage of the GPU is still the shared memory, especially during video editing. We edit our review videos for YouTube with DaVinci Resolve and occasionally run into the 16 GB VRAM limit on the RTX 4090 during editing.

Compared to current laptop GPUs from Nvidia, the M3 Max GPU can clearly beat the RTX 4070 Laptop in the cross-platform 3DMark Wildlife Unlimited test and is only 9 % behind the GeForce RTX 4090 Laptop at 125 Watts and 23 % [behind] the 4090 Laptop at 175 Watts, respectively. The GFXBench test shows a deficit of 4 % for the M3 Max compared to the RTX 4080 Laptop at 175 Watts. The OpenCL performance is worse and the faster RTX GPUs are clearly ahead, but the RTX 4070 Laptop at comparable 60 Watts TGP (Galaxy Book3 16 Ultra) is 11 % slower in Geekbench.

The GPU performance is completely stable during longer workloads as well as on battery power, which is not the case for the powerful GeForce GPUs. More GPU benchmarks are listed in our tech section.


They also give the sustained power consumption of the CPU as 56 W:

Processor
Apple M3 Max 16-Core 16 x 2.7 - 4.1 GHz, 56 W PL1 / Sustained


Unfortunately, I don't know the methodology they used to measure either.
 
Last edited:
According to this article by Anton Avdyushkin (translated by Andreas Osthoff) at notebookcheck.net ( https://www.notebookcheck.net/Apple...allenges-HX-CPUs-from-AMD-Intel.766414.0.html ), the max power consumption of the M3 Max's 40-core GPU is 60 W.

One of the nice things about the following comparative assessment is that they included power consumption figures for both the M3 GPU and various NVIDIA 4000-series Laptop GPUs. In particular, they found a 4070 Laptop GPU configured for the same 60 W power consumption as the M3 Max's.

GPU Performance
The high-end M3 Max chip is equipped with a 40-core GPU, which means there are two additional cores compared to the previous 38-core unit. We also suspect the core clock is higher. The performance is between 13-25 % higher depending on the test, but the maximum power consumption dropped from 64 to 60 Watts. The big advantage of the GPU is still the shared memory, especially during video editing. We edit our review videos for YouTube with DaVinci Resolve and occasionally run into the 16 GB VRAM limit on the RTX 4090 during editing.

Compared to current laptop GPUs from Nvidia, the M3 Max GPU can clearly beat the RTX 4070 Laptop in the cross-platform 3DMark Wildlife Unlimited test and is only 9 % behind the GeForce RTX 4090 Laptop at 125 Watts and 23 % [behind] the 4090 Laptop at 175 Watts, respectively. The GFXBench test shows a deficit of 4 % for the M3 Max compared to the RTX 4080 Laptop at 175 Watts. The OpenCL performance is worse and the faster RTX GPUs are clearly ahead, but the RTX 4070 Laptop at comparable 60 Watts TGP (Galaxy Book3 16 Ultra) is 11 % slower in Geekbench.

Very interesting. Performance looks great for the Max, although their figures for Wildlife Extreme are quite low for the 4090. It’s possible there are some laptops that let the power run wild and skew the average, which is 42935. From here:

The GPU performance is completely stable during longer workloads as well as on battery power, which is not the case for the powerful GeForce GPUs. More GPU benchmarks are listed in our tech section.

They also give the sustained power consumption of the CPU as 56 W:

Processor
Apple M3 Max 16-Core 16 x 2.7 - 4.1 GHz, 56 W PL1 / Sustained


Unfortunately, I don't know the methodology they used to measure either.
It seems they use wall plug measurement.
1700245633768.png

Details of the multimeter here:
Test methodology here:

While I’m sure it makes sense to follow the standard methods of measuring power, I’m not enamoured of this way of measuring power. It would have been nice if they could included powermetrics results to as another data point. They claim 140 watts as maximum load power draw, which makes sense including other components. I’m more interested in just the cpu/gpu unfortunately.
 

Attachments

  • 1700245667426.png
    1700245667426.png
    55.6 KB · Views: 18
I had a chance to go to an Apple Store today and got to play with the M3 Max (40 core gpu) for a few minutes. I took to opportunity to download blender (version 4.0.1) and the Classroom and Barbershop scenes.

Results are from memory so only accurate to the second.

Classroom: RT off = 55 seconds. RT on = 20 seconds.
Barbershop; RT off = 3:30. RT on = 1:50.

Really nice performance increases from RT.
 
Back
Top