M3 core counts and performance

Jimmyjames · Nov 15, 2023

dada_dave said:
Agreed that would be interesting. Is that possible without going into the codebase to turn it off? I think @Aaronage said there didn’t appear to be any toggle or control.

Yes I’m pretty sure the setting allows you to turn MetalRT on or off.

Aaronage · Nov 15, 2023

Jimmyjames said:
Yes I’m pretty sure the setting allows you to turn MetalRT on or off.

The benchmark tool doesn't give you the option unfortunately.
I had a poke around to check if accepts any command line flags but that doesn't appear to be the case. The only options are blender version and render device (just CPU or GPU)
If you know a workaround let me know

Side note, I've been using the machine constantly since I ran the blender tests earlier (about 3.5 hours, mostly YouTube and browsing). IIRC, the battery was at around 93% when I finished the blender tests. The battery is at 80% now and iStats remaining time estimate is still 20 hours. Freakin' love this thing!

I think PCMag reported 30 hours of 720p offline playback? Well, I think 30 hours of 4K YouTube could be doable (maybe a perk of the AV1 decoder, not sure)

Purely anecdotal, but my feeling (so far) is M3 Pro makes an appreciable difference to light load efficiency vs. prior M SoCs. I’m not hypermiling - I’m deliberately not cleaning up after myself (leaving lots of background apps etc. running) and it just isn’t phased. Maybe some of it is thanks to software improvements not specific to this hardware (e.g. I noticed the scheduler and/or VMWare and/or virtualisation framework is making better choices with VMs, like placing Windows on the six E-cores while it runs updates instead of always on the P cores).

Just leaving powermetrics polling in the background, the average SoC power draw seems lower than I’m used to with my M1 work Mac. I would need to run a controlled test to be sure, but I could swear M1 typically averages out higher.
Like, just writing this message, the SoC hasn’t exceeded 350mW (with Safari, Music, Telegram, Ivory, Terminal, Sublime Text, Messages, Notes, iStat and all the system services like Time Machine, iCloud daemons etc. still doing things silently behind the scenes, not everything is “App Nap”’d). That’s nuts. M1 would typically at least bounce up to 1-5W from time to time, but this isn’t.

Wish I had the time to create some kind of automated tests to evaluate these questions properly

Jimmyjames · Nov 15, 2023

Aaronage said:
The benchmark tool doesn't give you the option unfortunately.
I had a poke around to check if accepts any command line flags but that doesn't appear to be the case. The only options are blender version and render device (just CPU or GPU)
If you know a workaround let me know

Side note, I've been using the machine constantly since I ran the blender tests earlier (about 3.5 hours, mostly YouTube and browsing). IIRC, the battery was at around 93% when I finished the blender tests. The battery is at 80% now and iStats remaining time estimate is still 20 hours. Freakin' love this thing!

I think PCMag reported 30 hours of 720p offline playback? Well, I think 30 hours of 4K YouTube could be doable (maybe a perk of the AV1 decoder, not sure)

Purely anecdotal, but my feeling (so far) is M3 Pro makes an appreciable difference to light load efficiency vs. prior M SoCs. I’m not hypermiling - I’m deliberately not cleaning up after myself (leaving lots of background apps etc. running) and it just isn’t phased. Maybe some of it is thanks to software improvements not specific to this hardware (e.g. I noticed the scheduler and/or VMWare and/or virtualisation framework is making better choices with VMs, like placing Windows on the six E-cores while it runs updates instead of always on the P cores).

Just leaving powermetrics polling in the background, the average SoC power draw seems lower than I’m used to with my M1 work Mac. I would need to run a controlled test to be sure, but I could swear M1 typically averages out higher.
Like, just writing this message, the SoC hasn’t exceeded 350mW (with Safari, Music, Telegram, Ivory, Terminal, Sublime Text, Messages, Notes, iStat and all the system services like Time Machine, iCloud daemons etc. still doing things silently behind the scenes, not everything is “App Nap”’d). That’s nuts. M1 would typically at least bounce up to 1-5W from time to time, but this isn’t.

Wish I had the time to create some kind of automated tests to evaluate these questions properly

Huh. I thought if you turned off MetalRT within Blender itself, it would honour those settings within the benchmark. Perhaps that’s wrong.

Aaronage · Nov 15, 2023

Jimmyjames said:
Huh. I thought if you turned off MetalRT within Blender itself, it would honour those settings within the benchmark. Perhaps that’s wrong.

Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender

Jimmyjames · Nov 15, 2023

Aaronage said:
Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender

Thanks for trying. I’m not especially experienced with Blender myself. It must be possible I imagine given there are results for cpu as well as gpu. For nvidia there are results for cuda and optix on the same gpu.

Aaronage · Nov 15, 2023

Jimmyjames said:
Thanks for trying. I’m not especially experienced with Blender myself. It must be possible I imagine given there are results for cpu as well as gpu. For nvidia there are results for cuda and optix on the same gpu.

I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0

Jimmyjames · Nov 15, 2023

Aaronage said:
I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0

No problem. Thank you for your efforts!

dada_dave · Nov 15, 2023

Aaronage said:
I ran the benchmark tool under Linux on a PC a while back, I vaguely recall the render devices list having separate options for different APIs (for NVIDIA I guess it would show OPTIX, CUDA etc.). It seems there’s only one option for Apple GPUs though (just Metal, not Metal and MetalRT).

I’ll play around with it more tomorrow. Maybe it’s possible to just render these benchmark scenes in Blender without the dedicated benchmark tool. To be honest, I’m not even 100% sure the benchmark tool is enabling MetalRT with version 4.0

If this ask is too much, don't hesitate to say no, but you could check some of this using the Xcode profiling tools - like RT cores utilization, memory bandwidth being used, etc ...

Discover new Metal profiling tools for M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how the new profiling tools in Xcode 15 can help you achieve the best Metal performance on Apple family 9 GPUs. Discover how to use...

developer.apple.com

Altaic · Nov 15, 2023

Aaronage said:
Gave that a try but it didn't seem to work (e.g. start Blender, change preference, quit, start benchmark tool)
I could still be missing something though. I'm a total noob to blender

You can download the “offline” (i.e. it can’t upload the results) python benchmark script from https://opendata.blender.org/about/ and you can use any blender settings you please

theorist9 · Nov 15, 2023

Looking back at the RAM discussion: One interesting thing to consider is that Apple's starting memory for its signature consumer laptop, the Air, hasn't changed in six years (all 2017 models came with 8 GB). And its starting memory for its prosumer MBP's (ignoring the 13" Intel versions, and the base M# AS versions) hasn't changed in nine (all versions of the mid-2014 15" MBP came with 16 GB).

The difference is those were the only memory choices—thus they were both the starting and max RAM—while today you have the option to get more. So Apple's position is effectively that the minimum you need hasn't changed in many years, while the maximum may have.

Aaronage · Nov 16, 2023

Altaic said:
You can download the “offline” (i.e. it can’t upload the results) python benchmark script from https://opendata.blender.org/about/ and you can use any blender settings you please

Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:

Code:

/Applications/Blender.app/Contents/MacOS/Blender --background \
  --factory-startup \
  -noaudio \
  --debug-cycles \
  --enable-autoexec \
  --engine \
  CYCLES \
  ~/Benchmarks/junkshop/main.blend \
  --python \
  main.py \
  -- \
  --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here

)

Aaronage · Nov 16, 2023

dada_dave said:
If this ask is too much, don't hesitate to say no, but you could check some of this using the Xcode profiling tools - like RT cores utilization, memory bandwidth being used, etc ...

Discover new Metal profiling tools for M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how the new profiling tools in Xcode 15 can help you achieve the best Metal performance on Apple family 9 GPUs. Discover how to use...

developer.apple.com

Ah good call, I'll give that a look at the weekend

Jimmyjames · Nov 16, 2023

Aaronage said:
Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:

Code:

/Applications/Blender.app/Contents/MacOS/Blender --background \ --factory-startup \ -noaudio \ --debug-cycles \ --enable-autoexec \ --engine \ CYCLES \ ~/Benchmarks/junkshop/main.blend \ --python \ main.py \ -- \ --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here )

That looks like a very nice improvement in performance with RT on. Time nearly halved? Very impressive.

exoticspice1 · Nov 16, 2023

Aaronage said:
Thanks! I gave that a shot

I had to grab the latest build from here https://ftp.nluug.nl/pub/graphics/blender/release/BlenderBenchmark2.0/script/ as the download link on that page is for an older version (2.0, latest is 3.1).

Got it to run but the benchmark script doesn't appear to use the preferences set in Blender. I ran it both with MetalRT set to on and off in Blender prefs but the result was the same (~30s for junkshop).

Script args were:

Code:

/Applications/Blender.app/Contents/MacOS/Blender --background \ --factory-startup \ -noaudio \ --debug-cycles \ --enable-autoexec \ --engine \ CYCLES \ ~/Benchmarks/junkshop/main.blend \ --python \ main.py \ -- \ --device-type METAL

If you're curious, the script accepts 'CPU', 'HIP', 'OPTIX', 'CUDA', 'METAL' and 'ONEAPI' as valid device types.

Rendering a frame directly in Blender (classroom) does show a difference, though
MetalRT off: 1m35s
MetalRT on: 53s

(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here )

Cinebench itself uses 7GB RAM. From what I saw online the 8GB model couldn't even start the blender benchmark.

8GB on Pro Macs needs to die.

dada_dave · Nov 16, 2023

Aaronage said:
(side note: look at the memory usage in the screenshots, hmm... I wonder how those Pro™ 8GB SKUs would perform here )

exoticspice1 said:
Cinebench itself uses 7GB RAM. From what I saw online the 8GB model couldn't even start the blender benchmark.

8GB on Pro Macs needs to die.

I would seriously love to see “ultra” benchmarks from these companies with larger scenes, 15-30GB, that takes a lot longer to render, would be, from my understanding, more indicative of larger workloads, and naturally would showcase GPUs that had access to large amounts of memory

. I do wonder how the M3 Max or Pro would fair against their competitors then.

Hell even higher steps for testing even bigger scenes.

Jimmyjames · Nov 17, 2023

I may have missed it if mentioned, but can anyone confirm the power usage of the 40 core Max gpu?

theorist9 · Nov 17, 2023

Jimmyjames said:
I may have missed it if mentioned, but can anyone confirm the power usage of the 40 core Max gpu?

According to this article by Anton Avdyushkin (translated by Andreas Osthoff) at notebookcheck.net ( https://www.notebookcheck.net/Apple...allenges-HX-CPUs-from-AMD-Intel.766414.0.html ), the max power consumption of the M3 Max's 40-core GPU is 60 W.

One of the nice things about the following comparative assessment is that they included power consumption figures for both the M3 GPU and various NVIDIA 4000-series Laptop GPUs. In particular, they found a 4070 Laptop GPU configured for the same 60 W power consumption as the M3 Max's.

GPU Performance
The high-end M3 Max chip is equipped with a 40-core GPU, which means there are two additional cores compared to the previous 38-core unit. We also suspect the core clock is higher. The performance is between 13-25 % higher depending on the test, but the maximum power consumption dropped from 64 to 60 Watts. The big advantage of the GPU is still the shared memory, especially during video editing. We edit our review videos for YouTube with DaVinci Resolve and occasionally run into the 16 GB VRAM limit on the RTX 4090 during editing.

Compared to current laptop GPUs from Nvidia, the M3 Max GPU can clearly beat the RTX 4070 Laptop in the cross-platform 3DMark Wildlife Unlimited test and is only 9 % behind the GeForce RTX 4090 Laptop at 125 Watts and 23 % [behind] the 4090 Laptop at 175 Watts, respectively. The GFXBench test shows a deficit of 4 % for the M3 Max compared to the RTX 4080 Laptop at 175 Watts. The OpenCL performance is worse and the faster RTX GPUs are clearly ahead, but the RTX 4070 Laptop at comparable 60 Watts TGP (Galaxy Book3 16 Ultra) is 11 % slower in Geekbench.

The GPU performance is completely stable during longer workloads as well as on battery power, which is not the case for the powerful GeForce GPUs. More GPU benchmarks are listed in our tech section.

They also give the sustained power consumption of the CPU as 56 W:

Processor
Apple M3 Max 16-Core 16 x 2.7 - 4.1 GHz, 56 W PL1 / Sustained

Unfortunately, I don't know the methodology they used to measure either.

Jimmyjames · Nov 17, 2023

theorist9 said:
According to this article by Anton Avdyushkin (translated by Andreas Osthoff) at notebookcheck.net ( https://www.notebookcheck.net/Apple...allenges-HX-CPUs-from-AMD-Intel.766414.0.html ), the max power consumption of the M3 Max's 40-core GPU is 60 W.

One of the nice things about the following comparative assessment is that they included power consumption figures for both the M3 GPU and various NVIDIA 4000-series Laptop GPUs. In particular, they found a 4070 Laptop GPU configured for the same 60 W power consumption as the M3 Max's.

GPU Performance
The high-end M3 Max chip is equipped with a 40-core GPU, which means there are two additional cores compared to the previous 38-core unit. We also suspect the core clock is higher. The performance is between 13-25 % higher depending on the test, but the maximum power consumption dropped from 64 to 60 Watts. The big advantage of the GPU is still the shared memory, especially during video editing. We edit our review videos for YouTube with DaVinci Resolve and occasionally run into the 16 GB VRAM limit on the RTX 4090 during editing.

Compared to current laptop GPUs from Nvidia, the M3 Max GPU can clearly beat the RTX 4070 Laptop in the cross-platform 3DMark Wildlife Unlimited test and is only 9 % behind the GeForce RTX 4090 Laptop at 125 Watts and 23 % [behind] the 4090 Laptop at 175 Watts, respectively. The GFXBench test shows a deficit of 4 % for the M3 Max compared to the RTX 4080 Laptop at 175 Watts. The OpenCL performance is worse and the faster RTX GPUs are clearly ahead, but the RTX 4070 Laptop at comparable 60 Watts TGP (Galaxy Book3 16 Ultra) is 11 % slower in Geekbench.

Very interesting. Performance looks great for the Max, although their figures for Wildlife Extreme are quite low for the 4090. It’s possible there are some laptops that let the power run wild and skew the average, which is 42935. From here:

3DMark.com search

www.3dmark.com

theorist9 said:
The GPU performance is completely stable during longer workloads as well as on battery power, which is not the case for the powerful GeForce GPUs. More GPU benchmarks are listed in our tech section.

They also give the sustained power consumption of the CPU as 56 W:

Processor
Apple M3 Max 16-Core 16 x 2.7 - 4.1 GHz, 56 W PL1 / Sustained

Unfortunately, I don't know the methodology they used to measure either.

It seems they use wall plug measurement.

Details of the multimeter here:

Leistungsmessungen mit dem Metrahit Energy Multimeter

In diesem Artikel stellen wir unsere Methoden zur Leistungsmessung mit dem Metrahit Energy Multimeter vor.

www.notebookcheck.com

Test methodology here:

Our Test Criteria

Information about the in-depth laptop reviews on www.notebookcheck.net

www.notebookcheck.net

While I’m sure it makes sense to follow the standard methods of measuring power, I’m not enamoured of this way of measuring power. It would have been nice if they could included powermetrics results to as another data point. They claim 140 watts as maximum load power draw, which makes sense including other components. I’m more interested in just the cpu/gpu unfortunately.

Jimmyjames · Nov 17, 2023

I had a chance to go to an Apple Store today and got to play with the M3 Max (40 core gpu) for a few minutes. I took to opportunity to download blender (version 4.0.1) and the Classroom and Barbershop scenes.

Results are from memory so only accurate to the second.

Classroom: RT off = 55 seconds. RT on = 20 seconds.
Barbershop; RT off = 3:30. RT on = 1:50.

Really nice performance increases from RT.

Yoused · Nov 17, 2023

theorist9 said:
The OpenCL performance is worse and the faster RTX GPUs are clearly ahead, but the RTX 4070 Laptop at comparable 60 Watts TGP (Galaxy Book3 16 Ultra) is 11 % slower in Geekbench.

Does that largely account for the memory architecture difference? Is it possible to calculate the transaction cost of non-UMA in power and speed, or is it too trivial to matter?

M3 core counts and performance

Elite Member

Power User

Attachments

Elite Member

Power User

Elite Member

Power User

Elite Member

Elite Member

Site Champ

Site Champ

Power User

Attachments

Power User

Elite Member

Site Champ

Elite Member

Elite Member

Site Champ

Elite Member

Attachments

Elite Member

up

Similar threads