Yoused
up
- Joined
- Aug 14, 2020
- Posts
- 7,034
- Solutions
- 1
I just saw one of those performance charts over on that other site comparing CPUs. It had some intels, up to 11900, and topped with a couple of 5900-series Ryzens.
The M1 Maxes placed 3rd and 6th on the chart. However, there were two bars for each test, and the chart was sorted by SpecInt – both M1s absolutely blew everyone else out of the water on the SpecFP.
Now, I can understand that FP dispatch is faster on ARM, as FP/Neon is baked right into the instruction set rather than how FP/SSE/AVX512/etc is essentially grafted onto x86, and perhaps the few ns gained in dispatch can add up (apparently to quite a lot) over the many cycles of a SpecFP test. But is there some other thing going on there?
Improving integer performance (at least insofar as Spec tests go) seems like it would ultimately yield diminishing returns. Integer ops are pretty basic stuff, and speeding them up makes the easy stuff faster. But the real test of a processor is the hard stuff, which tends to lean toward the FP/SIMD realm.
Does the M1 have better efficiency at the logic level in performing FP, or is the A=B+C design simply that much more efficient than the A=A+B design?
The M1 Maxes placed 3rd and 6th on the chart. However, there were two bars for each test, and the chart was sorted by SpecInt – both M1s absolutely blew everyone else out of the water on the SpecFP.
Now, I can understand that FP dispatch is faster on ARM, as FP/Neon is baked right into the instruction set rather than how FP/SSE/AVX512/etc is essentially grafted onto x86, and perhaps the few ns gained in dispatch can add up (apparently to quite a lot) over the many cycles of a SpecFP test. But is there some other thing going on there?
Improving integer performance (at least insofar as Spec tests go) seems like it would ultimately yield diminishing returns. Integer ops are pretty basic stuff, and speeding them up makes the easy stuff faster. But the real test of a processor is the hard stuff, which tends to lean toward the FP/SIMD realm.
Does the M1 have better efficiency at the logic level in performing FP, or is the A=B+C design simply that much more efficient than the A=A+B design?