Apple: M1 vs. M2

Given the performance offered by the Studio, it is not entirely obvious to me that Apple even needs to produce a M-series pro. You can still get a $50+K Mac Pro, but the top end Studio at $8K pantses it already, so why would Apple bother? They can clearly sell more of the latter than the former: getting more Macs out there seems like a better strategy than selling a tiny number of niche products.
They've explicitly said a new Mac Pro is coming, though.
 
They've explicitly said a new Mac Pro is coming, though.
The lure of the Mac Pro had mainly been about modularity. It will be interesting to see what flavor of expansion these new boxes provide. I doubt it will be RAM. M.2 slots, sure. Graphics cards? I dunno, but I tend to doubt it. Multiple cpu board slots? Maybe?

Will be interesting to see how they position these as something above the ultra, other than double the multi thread performance and double the maximum RAM.
 
The lure of the Mac Pro had mainly been about modularity. It will be interesting to see what flavor of expansion these new boxes provide.

A couple decades back I was gifted my first digital camera. To facilitate its use, I got a USB/1384 card for my 7500. Because that stuff was newer than my machine. These days, though, what do they put on cards? I priced a Mac Pro and saw they offered a media accelerator card, but that is already in the SoC. Other than GPUs, what are slots used for anymore?
 
The lure of the Mac Pro had mainly been about modularity. It will be interesting to see what flavor of expansion these new boxes provide. I doubt it will be RAM. M.2 slots, sure. Graphics cards? I dunno, but I tend to doubt it. Multiple cpu board slots? Maybe?

Will be interesting to see how they position these as something above the ultra, other than double the multi thread performance and double the maximum RAM.

I think we will see some flavour of MPX. I feel it'd be odd for Apple to have created this whole MPX thing for just the 2019 Mac Pro - Even if it is almost just normal PCIe.
With iPadOS 16 they brought DriverKit to iPadOS Allowing M-series iPads to have apps with PCIe device drivers for external Thunderbolt->PCIe based enclosures. Seems like a niche use-case but it shows a willingness to do PCIe based expansion on Apple Silicon devices too and helps the Mac platform too. With Afterburner, Apple also showed that they are ready to produce their own MPX based expansion cards. Could easily continue similarly with an Apple Silicon Mac Pro, even if the old Afterburner specifically will probably be obsolete because the main SoC in a future Mac Pro will have better ProRes acceleration already. If they are ready to go NUMA, which I kinda doubt but ey, we could see potentially MPX based M1 Max add-in cards, so you can just add four extra M1 Max cards or whatever.
Regardless they need something that can support internal PCIe based hardware iLok software license keys for a Mac Pro I think. I really look forward to seeing how they'll execute it. If nothing else, it'll have a "halo product" effect. A Mac Pro that will show up in benchmarks as outclassing most if not all other workstations will be good marketing for Mac as a platform
 
A couple decades back I was gifted my first digital camera. To facilitate its use, I got a USB/1384 card for my 7500. Because that stuff was newer than my machine. These days, though, what do they put on cards? I priced a Mac Pro and saw they offered a media accelerator card, but that is already in the SoC. Other than GPUs, what are slots used for anymore?
I recently used a slot to install a card adding m.2 sockets and higher speed ethernet to a server. I suppose some people do stuff like that. I dunno.
 
Running the first (and much shorter) of @theorist9 ’s test cases, here is the result:

1655683112137.png



Note: I didn’t quit all the other open apps on my Mac while running this (on a fully-loaded M1 Max MBP), but none of them were using more than a percent or 2 of CPU.

Second, longer test is running now, and I’ll report when it’s done (I have to leave the house for awhile and it will probably finish while I am gone).
 
Running the first (and much shorter) of @theorist9 ’s test cases, here is the result:

View attachment 15111


Note: I didn’t quit all the other open apps on my Mac while running this (on a fully-loaded M1 Max MBP), but none of them were using more than a percent or 2 of CPU.

Second, longer test is running now, and I’ll report when it’s done (I have to leave the house for awhile and it will probably finish while I am gone).
Some background for those viewing this:

I created two Mathematica benchmarks, and sent them to @Cmaier (and @casperes1996). These calculate the %difference in wall clock runtime between whatever they're run on and my 2019 i9 iMac (see config details below).

Symbolic benchmark: Consists of six suites of tests: Three integration suites, a simplify suite, a solve suite, and a miscellaneous suite. There are a total of 58 calculations. On my iMac, this takes 37 min, so an average of ~40s/calculation. It produces a summary table at the end, which shows the percentage difference in run time between my iMac whatever device it's run on. Most of these calculations appear to be are single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the last one (polynomial expansion) appears to be multi-core (CPU ~ 500%).

Graphing and image processing benchmark (the one posted above): Consists of five graphs (2D and 3D) and one set of image processing tasks (processing an image taken by JunoCam, which is the public-outreach wide-field visible-light camera on NASA’s Juno Jupiter orbiter). It takes 2 min. on my 2019 i9 iMac. As with the above, it produces a summary table at the end. The four graphing tasks appear to be single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the imaging processing task appears to be multi-core (CPU ~ 250% – 400%).

Here's how the percent differences in the summary tables are calculated (ASD = Apple Silicon Device, or whatever computer it's run on):

% difference = (ASD time/(average of ASD time and iMac time) – 1)*100.

Thus if the iMac takes 100 s, and the ASD takes 50 s, the ASD would get a value of –33, meaning the ASD is 33% faster; if the ASD takes 200 s, it would get a value of 33, meaning it is 33% slower. By dividing by the average of the iMac and ASD times, we get the same absolute percentage difference regardless of whether the two-fold difference goes in one direction or the other. For instance, If we instead divided by the iMac time, we'd get 50% faster and 100% slower, respectively, for the above two examples.

I also provide a mean and standard deviation for the percentages from each suite of tests. I decided to average the percentages rather than the times so that all processes within a test suite are weighted equally, i.e., so that processes with long run times don't dominate.

iMac details:
2019 27" iMac (19,1), i9-9900K (8 cores, Coffee Lake, 3.6 GHz/5.0 GHz), 32 GB DDR4-2666 RAM, Radeon Pro 580X (8 GB GDDR5)
Mathematica 13.0.1
MacOS Monterey 12.4

****************
Looking at the results Cmaier posted, his M1 Max is ~20% faster at generating and displaying the graphs than my 2019 i9 iMac, but nearly 50% slower at the image processing task (~80 s vs ~30 s). When we were discussing this, Cmaier suggested a reason, but I'll leave that to him to post if he wishes.
 
Last edited:
Some background for those viewing this:

I created two Mathematica benchmarks, and sent them to @Cmaier (and @casperes1996).

Symbolic benchmark: Consists of six suites of tests: Three integration suites, a simplify suite, a solve suite, and a miscellaneous suite. There are a total of 58 calculations. On my 2019 i9 iMac, this takes 37 min, so an average of ~40s/calculation. It produces a summary table at the end, which shows the percentage difference in run time between my iMac whatever device it's run on. Most of these calculations appear to be are single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the last one (polynomial expansion) appears to be multi-core (CPU ~ 500%).

Graphing and image processing benchmark (the one he just posted): Consists of five graphs (2D and 3D) and one set of image processing tasks (processing an image taken by JunoCam, which is the public-outreach wide-field visible-light camera on NASA’s Juno Jupiter orbiter). It takes 2 min. on my 2019 i9 iMac. As with the above, it produces a summary table at the end. The four graphing tasks appear to be single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the imaging processing task appears to be multi-core (CPU ~ 250% – 400%).

All times are wall clock.

Here's how the percent differences in the summary tables are calculated (ASD = Apple Silicon Device, or whatever computer it's run on):

% difference = (ASD time/(average of ASD time and iMac time) – 1)*100.

Thus if the iMac takes 100 s, and the ASD takes 50 s, the ASD would get a value of –33, meaning the ASD is 33% faster; if the ASD takes 200 s, it would get a value of 33, meaning it is 33% slower. By dividing by the average of the iMac and ASD times, we get the same absolute percentage difference regardless of whether the two-fold difference goes in one direction or the other. For instance, If we instead divided by the iMac time, we'd get 50% faster and 100% slower, respectively, for the above two examples.

I also provide a mean and standard deviation for the percentages from each suite of tests. I decided to average the percentages rather than the times so that all processes within a test suite are weighted equally, i.e., so that processes with long run times don't dominate.

****************
Looking at the results Cmaier posted, his M1 Max is ~20% faster at generating and displaying the graphs, but nearly 50% slower at the image processing task (~80 s vs ~30 s). When we were discussing this, Cmaier suggested a reason, but I'll leave that to him to post if he wishes.
My theory was that image processing likely uses the math library, which is not optimized for M1.
 
Ran the graphing one again. Activity monitor said the wolframkernal never got about around 240%. Mostly it was around 110% CPU, with a couple very short bursts in the 220% region, with a very very quick peak at 240%. So it’s clearly not making use of all the cores, at least on M1.
 
While it's fun to compare the latest and greatest CPUs, both M1-series and high-end x86, to the M2, that's not what the average user, who just wants a decent everyday computer, is using. I previously compared the leaked M2 benchmarks to the latest Mac Pro which uses a Cascade Lake Xeon W, from 8-cores to 28-cores. It's remarkable how the M2 nearly doubles the 8-core Mac Pro in single-core, and bests it in multi-core. However, very few PCs ship with Xeons, substantially fewer are Mac Pros.

I realize that I'm about to trade my nerd street cred in for a humbling experience, all in the name of benchmarking. Much like our neanderthal ancestors, who lived off the land, foraged for sustenance, and raided local tribes for resources, I too have learned to suffer though my daily existence, using a technological fossil from the before times, an ancient relic of a bygone era, the scraps off the digital heap.

Not only do I still use an Intel Mac mini as my daily machine, it's a base model i3, manufactured in the dark days of the stagnant 14nm++++ epoch. Many generations of innovation have come and gone during the past four years, since I purchased my Mac mini, yet I still persevere in silence, waiting for TSMC to move their chess pieces forward, allowing Apple to bring me to the M3 promise land.

I normally purchase high-end Mac minis, this being the fourth that I've owned since 2005, and keep them for as long as realistically possible. However, the rumors of the switch to Arm were strong in 2018, so I decided to settle for a base model, upgrading from a 2011 unit, the last mini to feature optional discrete AMD GPUs. This unit would be a "stopgap" until Apple heralded the arrival of Arm Macs. My 2018 Mac mini includes such innovations as a 4-core 3.6Ghz i3-8100B, 8GB of 2667Mhz system memory, and a spacious 120GB internal SSD, not to mention Intel's integrated graphics. (If anyone is wondering what the "B" next to the 8100 stands for, that denotes the ability to use DDR4-2666, instead of DDR-2400. I wouldn't be surprised if Intel made this exception at Apple's request. I'm sure that 10% higher bandwidth makes a huge difference.)

Then, once the M1 was announced, I realized that the transition would be slightly different than I had anticipated, and decided to hang on to my 2018 Mac mini, at least until the M3 generation. Once the M3 is in production, I'll likely purchase a high-end Mac mini or a mid-range Mac Studio, depending on features and M3 variants. Until then, I'm holding my i3 Mac mini together with "sticks and bubble gum". I upgraded the system RAM to 64GB, added a BlackMagic RX 580 8GB eGPU, a Samsung 500GB USB-C SSD, and purchased a brand new 21.5-inch LG UltraFine off of Ebay last year, which somebody was evidently hiding under their mattress, since it was canceled two years prior. Add to this other doohickies, doodads and gewgaws to keep my lowly x86 Frankenstein's monster sustainably running. I would note that, other than the peculiar acquisition of the LG, everything was refurbished or previously owned.

So, my long-winded explanation aside, it's time for the blatant self-flagellation, as I throw my Intel Mac mini on the pyre, hoping for mercy among my fellow nerds on this forum. I just ran Geekbench, so that I can compare my base model Intel i3-8100B, to the base model M2, which we now have benchmarks for. The slaughter was nigh, the gladiatorial pit bloodied, and my Mac mini had nary a chance for victory, cleaved asunder, felled by Apple's superior semiconductors. Still, I found it instructive to compare an "everyday" Mac from four years ago, to Apple's latest and greatest.

Hence, with substantial trepidation, I bring to you, ladies and gentlemen, the aftermath of the skirmish, thrown into the fray once more, a one-sided conflagration comparing my x86 Mac mini to the M2:

Geekbench 5.4.5 results:

My i3 Mac mini:

Single-core: 912
Multi-core: 3554

M2:

Single-core: 1919
Multi-core: 8929

My Mac mini with RX 580 eGPU:

Metal: 36800

M2 Metal: 30627

In summation:

The M2 is a 110% performance increase in single-core.
The M2 is a 151% performance increase in multi-core.
The M2 is a 17% performance decrease in Metal compared to the RX 580 eGPU.

Keep in mind that Apple currently sells the BlackMagic RX 580 eGPU on their website for $699, the exact same price as an M1 Mac mini, and I assume the eventual M2 unit. (I got my BlackMagic eGPU for $400, but that's still a lot for an older GPU.) Considering that the BlackMagic eGPU looks like a small nuclear reactor, and has the power requirements necessary for one, then the small shortfall with the M2 is understandable, and a pyrrhic victory for my lowly "sticks and bubble gum" Mac mini.

The whole purpose of this exercise was to compare the base Intel model from four years ago, to the base Apple Silicon model from today. These aren't CPUs that are used by professional graphics artists, animators, mathematicians, astrophysicists, engineers, and rich people who don't need high-end tech but own it anyway. I have a regular, everyday, peasant configuration, which is what the vast majority of Mac owners are using in their day-to-day computing lives. I've done everything I can to spruce it up, fake mustache and all, in an attempt to put lipstick on this x86 pig, but even then it doesn't compare to the M2. This doesn't even include the substantially improved thermals, energy usage, and reduction in noise that Apple Silicon brings. The i3 Mac mini gets surprisingly hot, bafflingly noisy, as does the eGPU, when even moderately stressed. Even at full load, the M-series are silent, cool running little beasts, compared to the supposedly energy efficient Intel designs of yesteryear.

While benchmarks against M1 Maxes and Xeon Mac Pros show that the M2 is impressive, it's not even close with the preceding Intel models that the M2 is destined to replace. The M2 continues Apple Silicon's tectonic shift in performance, energy usage, noise levels, weight, and form factors. When I do finally upgrade to Apple Silicon, perhaps during the M3 generation, it's going to be ridiculous in how much difference I will experience. For what it is worth, I've enjoyed my little Intel Mac mini with its quaint i3, but whatever Apple Silicon Mac I do upgrade to will be a titanic shift compared to what I currently use, no mater how much bubble gum, sticks, and thermal paste I use. Until then, I will suffer through my grievous blight of chip envy, bedazzled by those of you who have already made the switch to Apple Silicon.
 
And here are the results from the second, much more time consuming, test.

View attachment 15120
It appears you were correct :) — it's never slower than the i9 for these symbolic calculations.

In addition, the % differences between the two for the Simplify/Solve/Misc calculations are about what you'd expect based on the differences in their single-core GB scores:
1750/[mean(1300, 1750)] =1.16 (with the caveat that the latter is for scores instead of runtimes; I don't know how GB transforms one into the other).

...and the additional caveat that GB also uses libraries which, if not yet optimized for AS, may be reducing the AS processor's score relative to what it could be.

Here are the ones GB says it uses:

1655692774593.png


1655692765503.png



For many of the integrals, OTOH, it seems more software optimization remains to be had.
 
It appears you were correct :) — it's never slower than the i9 for these symbolic calculations.

In addition, the % differences between the two for the Simplify/Solve/Misc calculations are about what you'd expect based on the differences in their single-core GB scores:
1750/[mean(1300, 1750)] =1.16 (with the caveat that the latter is for scores instead of runtimes; I don't know how GB transforms one into the other).

...and the additional caveat that GB also uses libraries which, if not yet optimized for AS, may be reducing the AS processor's score relative to what it could be.

Here are the ones GB says it uses:

View attachment 15123

View attachment 15122


For many of the integrals, OTOH, it seems more software optimization remains to be had.

Presumably, over time, these libraries will all get optimized. But it looks like, for non-numerical stuff, M1 is functioning as expected.
 
Ran the graphing one again. Activity monitor said the wolframkernal never got about around 240%. Mostly it was around 110% CPU, with a couple very short bursts in the 220% region, with a very very quick peak at 240%. So it’s clearly not making use of all the cores, at least on M1.
Thanks for checking that. [For others: I had asked Cmaier to check this because on my machine the image processing task—the one that was much slower on his M1 Max—gets Activity Monitor to 200%-400%, sustained. So I was wondering if the reason for this might be reduced core utilization.]

If Mathematica is missing SIMD vectorization and/or available numerical libraries for image processing on the M1, could this be causing the reduced core utilization (i.e., would the SIMD instructions and/or numerical libraries be run on separate cores)? Or is this likely independent of whether Mathematica is using these on M1? [I assume vectorization is an inherent part of the code and thus would be run on the same core(s), but I don't know.]
 
Thanks for checking that. [For others: I had asked Cmaier to check this because on my machine the image processing task—the one that was much slower on his M1 Max—gets Activity Monitor to 200%-400%, sustained. So I was wondering if the reason for this might be reduced core utilization.]

If Mathematica is missing SIMD vectorization for image processing on the M1, could this be causing the reduced core utilization (i.e., would the SIMD instructions be run on separate cores)? Or is this likely independent of whether Mathematica is optimized to use SIMD on M1?

Each core would support SIMD, so doubtful it’s related in that sense. But the math library may also not be properly optimized to use as many cores as it should on M1. It could be that there is some weird logic that uses no more than 2 physical cores, and on the Intel chip that looks like 4 cores because of hyperthreading? Or it could just be lack of optimization where for whatever reason it doesn’t launch as many threads as it should on M1, or it is not allowing M1 to dispatch the threads intelligently, or something else entirely.
 
Each core would support SIMD, so doubtful it’s related in that sense. But the math library may also not be properly optimized to use as many cores as it should on M1. It could be that there is some weird logic that uses no more than 2 physical cores, and on the Intel chip that looks like 4 cores because of hyperthreading? Or it could just be lack of optimization where for whatever reason it doesn’t launch as many threads as it should on M1, or it is not allowing M1 to dispatch the threads intelligently, or something else entirely.
In Mathematica one has the option to limit how many threads are used for MKL. The default on my machine is 8:

1655694128945.png

But you can set it to 1:
1655694199283.png

I reran the image processing task with it set to 1, and it had no effect on either run time or core utilization. But perhaps there are other libraries besides MKL it uses for image processing.

I'll send an email to Wolfram technical support mentioning the difference in run times for the image processing task as a potential target for future optimization.
 
I have now also run @theorist9 's first benchmark. - And when I say first I mean what is here called the second, but it was the first in my mailbox, haha

I want to point out that on my 16" MacBook Pro with M1 Max the fans remained entirely off for the almost entire duration of the test, and the hottest CPU core was hovering around 50-60°C for that time; While I have not tried running it on my 10700K iMac, I have a feeling the fan would be loud and the CPU would go near 100°C there.

Unsurprisingly my numbers aren't too different from Cmaier's numbers. And I think almost any M1 chip; Pro, Max or Ultra would do about the same here given that my CPU usage also seemed very single-threaded, mostly just 100% CPU usage (where 10,000% represents all 10 cores).
The exception is near the end of the test where usage was mostly ~450% and this is also when the fans did kick in, but only running at the minimum RPM and basically still entirely silent. The laptop did heat up but nothing like my old Intel MacBook Pro, and the hottest core measurement was still just 80°C and the fans had a lot of headroom, given they still ran at minimum speeds.

I ran it on the balanced power profile on battery - though neither should matter particularly m much, since the high power power profile only really makes a difference in all core + GPU workloads and being on battery doesn't hurt performance until the battery gets really low. But noteworthy is the fact that even with screen brightness at maximum during the test with the bright white background of Mathematica and the constant 100% CPU workload, (+ a few other things running with minimal CPU overhead, but still), battery only dropped by around 10 percentage points.
 
Back
Top