M3 core counts and performance

Hmmm MaxTech claim the most power they saw used by the gpu in the Max was 33 watts.
1700247460254.png
 
Does that largely account for the memory architecture difference? Is it possible to calculate the transaction cost of non-UMA in power and speed, or is it too trivial to matter?
Most of these benchmarks are too small to require constant swapping between CPU and GPU RAM. So you’ve really only got the initial movement of data onto the GPU which yes will be too trivial to have much effect on these benchmarks.
 
I thought about posting this in the gaming forum but given our discussions it makes sense here too:



Summary: Basically you do run into limits of the 8GB of RAM on the base M3 MacBook especially for higher end games, but it was surprisingly decent at running a number of games. Upgrading to 16GB of RAM would probably net you better performance but then it’s only $200 more for the lower M3 Pro and much better gaming performance overall.
 
Odd do you remember what the highest they measured for the M2 Max was? I could’ve sworn it was a 50W GPU.
Correct
1700252059433.jpeg


This measurement was taken while running Wildlife Extreme and the M3 one was take while running Cinebench 2024 gpu test. Unless the RT cores are more efficient than we thought, it’s a suspicious claim.
 
Correct
View attachment 27350

This measurement was taken while running Wildlife Extreme and the M3 one was take while running Cinebench 2024 gpu test. Unless the RT cores are more efficient than we thought, it’s a suspicious claim.
Maybe CB2024 doesn't stress the GPU that much? Doesn't seem likely ... could be another weird powermetrics output? This is why for formal reviews sanity checking with wall power is necessary even though wall power has its own issues. I take it they didn't run Wildlife Extreme on the M3 again?
 
Maybe CB2024 doesn't stress the GPU that much? Doesn't seem likely ... could be another weird powermetrics output? This is why for formal reviews sanity checking with wall power is necessary even though wall power has its own issues. I take it they didn't run Wildlife Extreme on the M3 again?
I think they did run Wildlife Extreme, but afaik didn’t measure power. I mainly skim their videos tbh so perhaps I missed it.
 
Here is what I am wondering about: can HRT be accessed directly, for other types of uses (and how specialized is it)?

There is this app that I sometimes find entertaining/informative, and its price is pretty good. The graphics and animation are lovely, but the observer's motion is always linear, which is exactly what never happens in nature.

Some users have asked for gravitational effects upon the observer – the curator responds with gravity is too difficult to calculate. But it seems to me like RT node traversal would be great at this task. You need a local gradient for the observer, aggregated from all the measurable bodies in the area. Each measurable body has a location relative to the observer, which reduces to distance and direction, a motion vector relative to the observer's vector of motion, and a mass, which resolves by inverse square, to the body's gravitational effect on the observer.

The local gradient is the sum of the gravitational effects of all measurable bodies (if you are, say, very near Saturn, the effect of Tau Ceti will not be measurable, at least per the observer's frame of reference). Seems like HRT could help make this kind of computation real-time useful, including the dynamics (because everything is always moving, the local gradient value is very non-static).

Also, the app allows the observer to transit space very fast. Lightyears-per-second, even. It would be an interesting study, to see how a FTL observer interacts with gravity, as based on raw maths.
 
Here is what I am wondering about: can HRT be accessed directly, for other types of uses (and how specialized is it)?

There is this app that I sometimes find entertaining/informative, and its price is pretty good. The graphics and animation are lovely, but the observer's motion is always linear, which is exactly what never happens in nature.

Some users have asked for gravitational effects upon the observer – the curator responds with gravity is too difficult to calculate. But it seems to me like RT node traversal would be great at this task. You need a local gradient for the observer, aggregated from all the measurable bodies in the area. Each measurable body has a location relative to the observer, which reduces to distance and direction, a motion vector relative to the observer's vector of motion, and a mass, which resolves by inverse square, to the body's gravitational effect on the observer.

The local gradient is the sum of the gravitational effects of all measurable bodies (if you are, say, very near Saturn, the effect of Tau Ceti will not be measurable, at least per the observer's frame of reference). Seems like HRT could help make this kind of computation real-time useful, including the dynamics (because everything is always moving, the local gradient value is very non-static).

Also, the app allows the observer to transit space very fast. Lightyears-per-second, even. It would be an interesting study, to see how a FTL observer interacts with gravity, as based on raw maths.

If you can formulate this problems as an RT problem (that is, rays intersecting a 3D bounding volume hierarchy), then sure, you can use the hardware to accelerate your task. But it sounds to me that you are really looking for nearest neighbours/region query kind of thing, which is a different problem.

Hardware RT in its current form doesn’t appear to implement a general-purpose spatial indexing structure. It’s really all about optimizing ray location on parallel hardware. For example, Nvidia and Apple use very wide trees to reduce the number of nodes they need to traverse (fewer traversed nodes = fewer loads). And the fixed function circuitry is optimized to do ray/box intersections, and just that.
 
Hmmm MaxTech claim the most power they saw used by the gpu in the Max was 33 watts.
View attachment 27349
Do you know what were they using to stress the GPU? 🤔

Power draw varies quite a lot with M3 Pro 18C
Blender RT off = ~19.5W peak
Blender RT on = ~18.5W peak
Cinebench 2024 = ~17.5W peak
Cities Skylines = ~6.5W (2560x1440 high preset - it's frame limited so the GPU is just chilling here. Just including this as an extreme "other end of the scale" example)
 
Do you know what were they using to stress the GPU? 🤔

Power draw varies quite a lot with M3 Pro 18C
Blender RT off = ~19.5W peak
Blender RT on = ~18.5W peak
Cinebench 2024 = ~17.5W peak
Cities Skylines = ~6.5W (2560x1440 high preset - it's frame limited so the GPU is just chilling here. Just including this as an extreme "other end of the scale" example)
It was Cinebench 2024 iirc.
 
Do you know what were they using to stress the GPU? 🤔

Power draw varies quite a lot with M3 Pro 18C
Blender RT off = ~19.5W peak
Blender RT on = ~18.5W peak
Cinebench 2024 = ~17.5W peak
Cities Skylines = ~6.5W (2560x1440 high preset - it's frame limited so the GPU is just chilling here. Just including this as an extreme "other end of the scale" example)
Regardless of Max’s max 33W claim for the Max (couldn’t resist 🙃), that RT on uses less power than off on the same workload makes sense. After all it still has *do the ray tracing* regardless but now it has the fixed function hardware to accelerate it which not only increases performance but should also reduce energy consumption. I think previously I said that why 3.6 might have higher energy cost than 4.0 is maybe due to bandwidth effects I had previously mentioned. And though that might still be partially true, I forgot that *3.6 still requires ray tracing to complete the scene* so if the GPU doesn’t have the RT cores (or in the case of Apple, 3.6 doesn’t use them) to offload that work that’s going to not just hit performance but also increase energy consumption.
 
Here are the results posted by Andrei Frumusanu at AnandTech (https://www.anandtech.com/show/17024/apple-m1-max-performance-review/3) for the package power draw of the M1 Max (32 GPU cores). His tests maxed out at 44 W for the CPU and 57 W for the GPU.

Given that the 32-core M1 GPU can draw 57 W, I'd say notebookcheck's ≈60 W (≈55 W if you convert from wall power to package power?) is a more plausible maximum for the 40-core M3 than maxtech's ≈33 W. Too bad Anandtech didn't do a similar analysis for the M2.

1700274514875.png
 
Here are the results posted by Andrei Frumusanu at AnandTech (https://www.anandtech.com/show/17024/apple-m1-max-performance-review/3) for the package power draw of the M1 Max (32 GPU cores). His tests maxed out at 44 W for the CPU and 57 W for the GPU.

Given that the 32-core M1 GPU can draw 57 W, I'd say notebookcheck's ≈60 W (≈55 W if you convert from wall power to package power?) is a more plausible maximum for the 40-core M3 than maxtech's ≈33 W. Too bad Anandtech didn't do a similar analysis for the M2.

View attachment 27355
To be honest I’m not sure I believe either MaxTech’s figure, or Notebookcheck’s! I’m quite prepared to believe powermetrics doesn’t always tell the truth, and that wall power can mislead. We’ll see I guess.
 
Several things.

TSMC claims either better performance or lower power on N3 compared to earlier processes: at least two sites, of uncertain reliability, say that the M3 GPU runs at the same clock speed as in M2 (1.4GHz), which would suggest lower power consumption by dint of the process. It could be that Apple chose 1.4 as the sweet spot for keeping the GPU well-fed with optimal throughput and any higher would lead to paower wasting delays (if the cores are waiting for data, they are drawing power to no good use).

N3 is said to have poor SRAM scaling, which may well relate to the dynamic cache design, meaning fewer transistors are needed to hold data (L1 and the register file are in one block). In addition, data may go directly to its destination: instead of copying a register to the cache, the result is stored directly in the correct cache line (especially effective if instructions can coalesce), to be cast out to memory later on. That is less work for the GPU to do (less power used).

Finally, there is FinFlex, which allows the engineers to fine tune the layout gate-by-gate. The GPU is most likely highly optimized for performance and efficiency, putting faster gates in the places where they are really needed and lower-power gates where switching speed is less critical.

It is not inconceivable that Apple has tweaked M3 to the point that it draws a third less peak power than its predecessors. Seems unlikely, but not unthinkable.
 
Apple's own graph—at least to the extent we can view this somewhat nebulous graph quantitatively—puts the peak power consumption of the 10-core M3 GPU at ≈17 W, which would extrapolate to ≈68 W for the 40-core M3 Max, if run at the same clocks.

Plus to the extent Apple is fudging this, they would be under-representing, rather than over-representing, the actual peak power consumption. Thus the actual peak consumption for the M3 GPU should be at least 17 W.

1700289211890.png


And, just for completeness, we can add the M3 CPU. Its peak power consumption is shown as ≈16 W, which extrapolates to ≈40–43 W for the upper-spec M3 Max*.

*The upper-spec M3 Max has 12P+4E, while the M3 has 4P+4E. Thus we can estimate the M3 Max's peak power consumption as:
16 W x (12 + 4 e)/(4 + 4 e), where e is the fractional peak power consumption of the E cores relative to the P cores.
For e values of 1/3, 1/4, 1/5, and 1/8, this gives extrapolated values of 40 W, 42 W, 43 W, and 44 W, respectively—which is nice, because it means you don't need to know the precise value of e to get a good estimate.
1700294055408.png
 
Last edited:
Apple's own graph—at least to the extent we can view this somewhat nebulous graph quantitatively—puts the peak power consumption of the 10-core M3 GPU at ≈17 W, which would extrapolate to ≈68 W for the 40-core M3 Max, if run at the same clocks.

Plus to the extent Apple is fudging this, they would be under-representing, rather than over-representing, the actual peak power consumption. Thus the actual peak consumption for the M3 GPU should be at least 17 W.

View attachment 27356

And, just for completeness, we can add the M3 CPU. Its peak power consumption is shown as ≈16 W, which extrapolates to ≈40–43 W for the upper-spec M3 Max*.

*The upper-spec M3 Max has 12P+4E, while the M3 has 4P+4E. Thus we can estimate the M3 Max's peak power consumption as:
16 W x (12 + 4 e)/(4 + 4 e), where e is the fractional peak power consumption of the E cores relative to the P cores.
For e values of 1/3, 1/4, 1/5, and 1/8, this gives extrapolated values of 40 W, 42 W, 43 W, and 44 W, respectively—which is nice, because it means you don't need to know the precise value of e to get a good estimate.
View attachment 27357
While the tests @Aaronage ran may not represent the GPU at its peak power levels, we know that according to his GPU powermetrics readings under Cinebench and Blender the Pro 18 core GPU required 17.5-19.5 Watts which would put the 40 core Max at 38-43W for those workloads. Now we may have to account for the difference that Andrei was reporting package power and I believe @Aaronage is reporting just GPU power (correct me if I’m wrong). Thus the two results may be less different than they first appear.

For just GPU power, it’s possible that CB24 really did only take up 33W. In which case it isn’t a very strenuous GPU test and even larger GPUs should show bad scaling with it. That hypothesis should be testable with data already available especially for Nvidia GPUs. Too late and too tired to do it now.

While all this may be different from the maximum possible power draw, I gotta think Blender with RT off has to be close. And a reading of ~45W GPU could entail over 50W of package which could in turn entail 60W wall. So the only really big discrepancy would be if CB24 on the Max chip really was 33W GPU and actually represented a maximum power draw. At least that last seems unlikely given @Aaronage’s results. There may still be some small discrepancies but unless @Aaronage was reporting package power I think we’re okay.
 
Last edited:
Back
Top