NotEntirelyConfused
Power User
- Joined
- May 15, 2024
- Posts
- 71
One issue is that the extremes are so extreme that plotting them distorts the graph to the point where most of the subtler patterns get obscured. Also, generally speaking with long tail distributions, medians are often, though not always, more useful than means for descriptive statistics precisely because it’s less sensitive to outliers.
In normal statistics, outliers are usually bad data, or just so insignificant and random that knowing about them is unhelpful - as dada_dave says, they can obscure more interesting information.I can certainly repost these charts with no omissions if anyone wants, but you are correct the outliers are craaaaaazy.
My point is that the information we're interested in may actually be better represented by the maxima rather than either mean or median. But of course the problem of bad data remains, whether "bad" means falsified, or just representing cases that purposefully obscure the answers we're looking for. "Falsified data" has an obvious meaning; the other kind of "bad data" would be an answer to a question we're not really trying to answer, and that's where it gets a little sticky. If we're trying to gain insight into how chips perform as designed, in normal use contexts, that's one thing. If we're trying to discern things about microarchitecture, that might wind up looking very similar, but not exactly the same. If we want to know how fast you can push the chip with unlimited power, unlimited cooling, and no concern for lifespan, then that's a very different question with a very different answer.
My impression is that for most of us, we're interested in design details, and also about performance under "normal" conditions. We also care to some extent about what the chip is capable of under "optimal normal conditions" - that is, no special accommodations like fancy cooling, but also removing any extraneous adverse influences to the maximum extent possible (i.e., background tasks that reduce benchmark scores).
If that's true, then *if* we can remove bad data, score maxima may be better, because they would represent legitimate measures of the chips in normal but optimal conditions. That's a big if though. Possibly top scores after discarding top n% (n=2..5?) would work - I don't really know. And I'm guessing scores for x86 chips will have far more bad data, which is a problem - or at least x86 con/prosumer chips will. Large server platform chips (Epyc/Xeon) probably a lot less so.