X86 vs. Arm

Agent47 · Apr 15, 2023

dada_dave said:
My Dad’s favorite was and probably still is VAX/VMS. He vastly preferred it to the Unix family of systems.

So he ain't an anti-VAXer?

Yoused · Apr 28, 2023

Over at TOP, Xiao_Xi posted a link to a GB comparison of a Zen 4 Lenovo vs M2 Air numbers. Mostly, they are pretty close. Then I noticed that the Lenovo has near eight times as much RAM. I would be curious how well it would do with 8, or even 16GB.

casperes1996 · Apr 28, 2023

Yoused said:
Over at TOP, Xiao_Xi posted a link to a GB comparison of a Zen 4 Lenovo vs M2 Air numbers. Mostly, they are pretty close. Then I noticed that the Lenovo has near eight times as much RAM. I would be curious how well it would do with 8, or even 16GB.

I mean memory is generally something you have enough of or not enough of. If you already have enough for your active data set adding more doesn’t benefit at all. Geekbench 5 at least was not a memory heavy test suite so I don’t think it matters all that much here

dada_dave · Apr 28, 2023

Yoused said:
Over at TOP, Xiao_Xi posted a link to a GB comparison of a Zen 4 Lenovo vs M2 Air numbers. Mostly, they are pretty close. Then I noticed that the Lenovo has near eight times as much RAM. I would be curious how well it would do with 8, or even 16GB.

Interesting!

casperes1996 said:
I mean memory is generally something you have enough of or not enough of. If you already have enough for your active data set adding more doesn’t benefit at all. Geekbench 5 at least was not a memory heavy test suite so I don’t think it matters all that much here

Aye I think more relevant would be actual power measurements - I mean we can get a rough idea from TDP that the M2 probably uses a good deal less power in SC and a somewhat less in MC depending on settings and workload but that would need to be confirmed since if I remember right the M2 uses a little more power than the M1 and AMD (though not as bad as Intel) can blow past TDP on burst workloads like GB.

Joelist · Apr 29, 2023

Remember the Air is fanless also. That Lenovo is decidedly not fanless. To really compare them run this stuff with both laptops on battery. Also GB unfortunately as it turns out did not fix the bug where their tests are too short and bursty and quit before Apple Silicon ramps up.

dada_dave · Apr 29, 2023

Joelist said:
Remember the Air is fanless also. That Lenovo is decidedly not fanless. To really compare them run this stuff with both laptops on battery. Also GB unfortunately as it turns out did not fix the bug where their tests are too short and bursty and quit before Apple Silicon ramps up.

That’s a AS-GB GPU problem not a CPU one which is all that’s being tested here and I thought GB did improve that? Release notes for GB implied that they did, whether or not they actually did I don’t know.

Because of the short nature of the tests the fan won’t be that big of a deal. But yes battery vs power cord could make a difference, though for the newest AMD processors I’m not sure it would be as big as it was in the past. Another thing to test. This also depends on the laptop’s settings. PC laptops can have a dizzying amount of controls in this regard which sometimes conflict with each other - I remember Ian Cutress I think doing a video/article on it when he was still at Anandtech. But that’s really why measuring power consumption during the test is the ultimate factor as that’s going to determine your efficiency on and off battery if you are running full tilt (and if the manufacturer decides to throttle you off power for that reason).

Yoused · Apr 29, 2023

AFAICT, Geekbench is about this is what your CPU is capable of, not so much this is what you should expect from it.

leman · Apr 29, 2023

Joelist said:
Also GB unfortunately as it turns out did not fix the bug where their tests are too short and bursty and quit before Apple Silicon ramps up.

Do you mean the GPU compute tests? From what I’ve seen, GB6 appears to have addressed it.

Aaronage · May 2, 2023

7840U looks great, but it’s not really in the same league IMO

The 15W TDP rating doesn’t mean much on its own. The actual peak and sustained power draw level will vary significantly depending on how the OEM tunes it. It could be allowed to consume 40W+ for an extended period, for example.

There’s also problems like:
1. Peak single thread performance at 5.1GHz. There’s no way the fan isn’t running to maintain that frequency. M1/M2 can hold peak single thread performance for a long time (forever?) without a fan (it’s only 5-6W)
2. There’s a huge difference between single-thread and multi-thread frequency. 1T to 16T scaling is relatively poor for 8 high performance cores (the M2 only has 4 x P-cores). If all the cores are loaded there will be a loss of single thread performance (e.g. when multi-tasking, your web browsing performance will suffer). M2 Air will obviously hit a thermal limit eventually, but a like-for-like fan cooled M2 (13“ MBP) wouldn’t.
3. The CPU and GPU share a power budget meaning it’s not possible to get peak CPU and GPU performance at the same time. The CPU will clock down when the GPU is working. Apple doesn’t seem to do this - in my testing, M1 and M2 machines will use as much power as needed and only scale back clocks when the thermal limit is reached. E.g. M1‘s CPU will hold 3GHz with all cores loaded regardless of what the GPU is doing (great advantage for gaming).

^ not to say Zen 4 APUs are bad. It looks way more impressive than what Intel is offering at the moment

leman · May 2, 2023

Aaronage said:
There’s also problems like:
1. Peak single thread performance at 5.1GHz. There’s no way the fan isn’t running to maintain that frequency. M1/M2 can hold peak single thread performance for a long time (forever?) without a fan (it’s only 5-6W)

Zen4 is quite good here actually. If I remember correctly they only need around 10 watts to maintain 5.2Ghz (as opposed to Intel who need 2-3x as much)

Aaronage said:
2. There’s a huge difference between single-thread and multi-thread frequency. 1T to 16T scaling is relatively poor for 8 high performance cores (the M2 only has 4 x P-cores). If all the cores are loaded there will be a loss of single thread performance (e.g. when multi-tasking, your web browsing performance will suffer). M2 Air will obviously hit a thermal limit eventually, but a like-for-like fan cooled M2 (13“ MBP) wouldn’t.

AMD has an advantage here because they have more P-cores. This allows them to clock them more conservatively in regards to the power curve.

Aaronage said:
3. The CPU and GPU share a power budget meaning it’s not possible to get peak CPU and GPU performance at the same time. The CPU will clock down when the GPU is working. Apple doesn’t seem to do this - in my testing, M1 and M2 machines will use as much power as needed and only scale back clocks when the thermal limit is reached. E.g. M1‘s CPU will hold 3GHz with all cores loaded regardless of what the GPU is doing (great advantage for gaming).

The thermal limit puts a hard constraint here though. At least in the MBA the effective TDP is 15watts. So not much difference between Apple and AMD here. Of course, Apple can deliver that performance in a passively cooled chassis, AMD cannot.

Yoused · May 2, 2023

leman said:
AMD has an advantage here because they have more P-cores.

AIUI, Zen4 does not have any actual E cores.

Aaronage · May 2, 2023

leman said:
Zen4 is quite good here actually. If I remember correctly they only need around 10 watts to maintain 5.2Ghz (as opposed to Intel who need 2-3x as much)

That’s pretty good! (Intel’s latest cores will push over 25W on mobile last I checked, but maybe my memory is fuzzy

)

leman said:
AMD has an advantage here because they have more P-cores. This allows them to clock them more conservatively in regards to the power curve.

True, it’s a valid option.

Another perspective - 7840U is likely using similar power to Mx Pro/Max in multi-thread loads thanks to boosting (maybe a touch less?), but it doesn’t come close to an 8P+2/4E Apple SoC.

leman said:
The thermal limit puts a hard constraint here though. At least in the MBA the effective TDP is 15watts. So not much difference between Apple and AMD here. Of course, Apple can deliver that performance in a passively cooled chassis, AMD cannot.

I’m not sure the MBA being fanless levels the playing field here.
7840U is almost certainly boosting well beyond 15W TDP, so a true 15W TDP face-off wouldn’t look so good for AMD.

(just to acknowledge, I’m assuming this 7840U system has a generous boost budget in all my comments, that could be wrong. It seems unlikely it wouldn’t given current trends)

casperes1996 · May 2, 2023

Yoused said:
AIUI, Zen4 does not have any actual E cores.

In general they just have one core design yes. But their 3D V-Cache parts can act a little bit like asynchronous chips at times and do require a schedular that's aware of the CCD topology to make best use of it, with one CCD clocking higher than the other but having less cache and vice versa. And then they've had the "favoured core" and such for a while too that the scheduler would need some knowledge of.

casperes1996 · May 8, 2023

Today I Learned:

Apple's M series of chips have a bit that can be toggled to enable TSO memory ordering such that memory behaves like x86. I always thought Rosetta 2 just added memory fences to achieve this, but no. It can actually toggle the CPU into TSO mode such that the chip itself handles memory like on x86

Cmaier · May 8, 2023

casperes1996 said:
Today I Learned:

Apple's M series of chips have a bit that can be toggled to enable TSO memory ordering such that memory behaves like x86. I always thought Rosetta 2 just added memory fences to achieve this, but no. It can actually toggle the CPU into TSO mode such that the chip itself handles memory like on x86

Yep. There was one other hardware addition to support rosetta, but I can’t remember what it was.

dada_dave · May 8, 2023

Cmaier said:
Yep. There was one other hardware addition to support rosetta, but I can’t remember what it was.

Which makes the “Rosetta 2 will be sunsetted right away!” panic that happened at the beginning extra silly (for many many reasons like the transition wasn’t and still isn’t even over, but the fact that Apple controlled the software AND had built in accelerating it in hardware should’ve been a clue that Rosetta would be around for at least a little while). That was one of the most eye roll worthy panics and all because someone discovered a string that said something in the order of “Apple reserves the right to not allow Rosetta in any given country”. Sigh.

KingOfPain · May 8, 2023

Cmaier said:
Yep. There was one other hardware addition to support rosetta, but I can’t remember what it was.

According to this blog post, in Rosetta 2 mode Apple Silicon also calculates specific x86 flags (PF, AF), which would normally not be used by the ARM architecture. For the ADDS/SUBS/CMP instruction it's just a byproduct and doesn't cost any additional cycles, while it would probably take several instructions to calculate these flags otherwise.

Why is Rosetta 2 fast?

Rosetta 2 is remarkably fast when compared to other x86-on-ARM emulators. I’ve spent a little time looking at how it works, out of idle curiosity, and found it to be quite unusual, so I figur…

dougallj.wordpress.com

Cmaier · May 8, 2023

KingOfPain said:
According to this blog post, in Rosetta 2 mode Apple Silicon also calculates specific x86 flags (PF, AF), which would normally not be used by the ARM architecture. For the ADDS/SUBS/CMP instruction it's just a byproduct and doesn't cost any additional cycles, while it would probably take several instructions to calculate these flags otherwise.

Why is Rosetta 2 fast?

Rosetta 2 is remarkably fast when compared to other x86-on-ARM emulators. I’ve spent a little time looking at how it works, out of idle curiosity, and found it to be quite unusual, so I figur…

dougallj.wordpress.com

Ah yeah that was it.

I hated dealing with those flags when I was designing those ALUs.

KingOfPain · May 9, 2023

Cmaier said:
I hated dealing with those flags when I was designing those ALUs.

I hope I haven't offended you with my "just a byproduct" comment.
Of course it's a hassle to implement the generation of the additional flags, but once it's there, it shouldn't really have an impact on the execution time of the instruction.

casperes1996 · May 9, 2023

This is incredible. I had no idea there was this much x86 helper logic in the hardware itself. That's pretty awesome. I wonder how much die space it costs though.

As for the PF and AF flags. I've never really used them manually. Can't speak for anything my compilers have done on my behalf of course. I can see use cases for PF in quick hash testing or something but I never really saw a use case for AF

X86 vs. Arm

Power User

up

Site Champ

Elite Member

Power User

Elite Member

up

Elite Member

Power User

Elite Member

up

Power User

Site Champ

Site Champ

Site Master

Elite Member

Site Champ

Site Master

Site Champ

Site Champ

Similar threads