- Joined
- Sep 26, 2021
- Posts
- 6,327
- Main Camera
- Sony
Assuming all 10 can run at the same time for long periods of time without causing hot spots or saturating the memory bus.It had better with 10 performance cores.
Assuming all 10 can run at the same time for long periods of time without causing hot spots or saturating the memory bus.It had better with 10 performance cores.
But you want to optimize memory bus saturation, based on the workload, just like you want EU saturation inside a core. There should be a unit that specifically assesses throughput efficiency and adjusts the clocks to minimize stalls while keeping everyone that has work to do busy. Where I used to work, we ran our machines much slower than top speed, because every fault stop was wasted productivity: often, you can get more work done at a slower pace by running steadily, just like you can get through town more efficiently by driving slower so that you are not stopping for every red light.
I'm having trouble getting power metrics to display the old format (cluster, CPU, DRAM, package). I guess it looks like this now?
View attachment 29165
@leman did they change the format? I looked at the man page but couldn't figure out how to access the previous data, tried --unhide-info <samplers> comma separated list of samplers to unhide (backwards compatibility) with various "dram_power" or "package_power" to no avail. EDIT: it seems they have removed some of the old sensors?
You can only slow the clock so far before you run into hold-time violations and start producing wrong answers.
Over at Reddit, Andrei F seemingly posted this:
“The table is misinterpreted and wrong as how it's portrayed - it's not per-SKU power variance, you should just wait for actual products. The workload is also not something realistic.”
I’m not sure it clears things up.
I don't think it clears up things at all.
Of course, it should be obvious that the AndroidAuthority article is a pile of poop, they got basic specs wrong, their language is confusing, and there is no discussion of methodology or what the numbers actually mean. Could be that this is some sort of dumb stress test, which would be worthless.
We need to wait for the final products.
I would have thought hold time violations are frequency independent, what's the mechanism behind this? Clock tree effects? If clock edges arrive at two flops involved in a potential hold time violation with the same skew across the whole frequency range, it seems to me like it shouldn't matter what the frequency is.
While the numbers from Android Authority seem suspect, for my own edification, I was wondering if I could plumb this direction further. If I understand, that not only is P ~= Fx V^2 a feature for simple circuits but also the blurb I found earlier that made it seem like voltage and frequency were collinear was an oversimplification for their toy example. In reality, full chips have a more complex relationship between the two and even for simple circuits increases in frequency may necessitate a range of possible voltage increases, including no increase at all or potentially greater percentage increase in voltage than frequency?Power is proportional to frequency x voltage^2, yes. (There’s a C and a ½ in there, too). But not sure I understand the rest of your post. Voltage and frequency are, in a sense, independent. You can, in theory, increase frequency without increasing voltage, and vice versa. (Though, to achieve more than a little frequency gain you likely need to increase voltage, because higher voltage causes transistors to switch faster).
That relation describes independently switching circuits. When amalgamated into a large chip, we usually see the kind of curve I drew above. As the curve gets more and more horizontal (there’s a horizontal asymptote), a small increase in performance can require a huge increase in power.
Over at Reddit, Andrei F seemingly posted this:
“The table is misinterpreted and wrong as how it's portrayed - it's not per-SKU power variance, you should just wait for actual products. The workload is also not something realistic.”
I’m not sure it clears things up.
I don't think it clears up things at all.
Of course, it should be obvious that the AndroidAuthority article is a pile of poop, they got basic specs wrong, their language is confusing, and there is no discussion of methodology or what the numbers actually mean. Could be that this is some sort of dumb stress test, which would be worthless.
We need to wait for the final products.
I got to admit I am a little frustrated. While Apple can be annoyingly vague in their product announcements at least you generally only have to wait a few weeks tops to see results in the wild. Qualcomm announced in October, and I get why they did that, but they are trying to have their cake and eat it too with really early product announcements while simultaneously being Apple-like (or worse actually, Apple is more forthcoming which is saying a lot). They then have to send Andrei (presumably he got clearance to say things, generally companies don't like engineers just spouting off the cuff) to clean up their own communications with "well actually that's not correct but I can't tell you what's correct, wait for final product release". That's a little aggravating. I mean launch is not *that* far away now but ... still ...
While the numbers from Android Authority seem suspect, for my own edification, I was wondering if I could plumb this direction further. If I understand, that not only is P ~= Fx V^2 a feature for simple circuits but also the blurb I found earlier that made it seem like voltage and frequency were collinear was an oversimplification for their toy example. In reality, full chips have a more complex relationship between the two and even for simple circuits increases in frequency may necessitate a range of possible voltage increases, including no increase at all or potentially greater percentage increase in voltage than frequency?
For example assuming the numbers from Android Authority were correct and assuming a simple circuit, then where the observed ratio of power of the top tier to the second tier chip is 2x (80W/40W) and the increase in all core frequency was 3.8/3.4, the needed increase in voltage to explain the power increase is about 1.34.
(P1/P0) = (F1/F0)*(V1/V0)^2 => V1/V0 = sqrt(2*3.8/3.4) = 1.34
So to explain the apparent 2x power draw (which, as @Jimmyjames wrote, Andrei is seemingly disputing) and the stated clock increase, they would've had needed to increase the voltage through the chip by 34%, presumably to cause the transistors to switch fast enough to keep up with the clocks (again, assuming a simple circuit, which it is not).
Do I have that right? or am I still not understanding something?
Right. The issue is that P=½CfV^2 holds, but V and f are not independent variables at the chip level. At a given voltage, there is a range of frequencies that works, but if you want to increase the frequency beyond that range, you need to increase V. So if you zoom in close on that curve I drew, it would be made up of lots of tiny f=2P/CV^2 sections, where different parts of the curve have different V’s. Because as you move to the right V has to get higher and higher, and you square it to get P, the curve flattens out toward an asymptote as you move to the right.
I’ve sort of lost track of all the benchmark numbers, so I’ll go by my understanding of what you just said. Increasing the clock 12% ((3.8-3.4)/3.4) would, ceteris paribus, cause a 12% increase in power dissipation. The “top tier” chip would be expected to have smaller C than the second tier chip - that’s one of the things that makes it bin faster. But if you assume C is the same, then the rest must be voltage. So to achieve 12% faster clock, they had to raise the voltage by 9 or 10% or so? (9^2 + 12 approximating 100% power increase?)
notP=½CfV^2
P=½C(V^2+f)
Oof. I remember back when Qualcomm announced the Snapdragon X Elite we already had a hard time making sense of the numbers. These allegations don't help. I guess we'll see when real products hit the market, but this doesn't bode well for Qualcomm, I'm guessing that the results from real laptops will make a lot more sense while being significantly slower (except for maybe the single core one).These seem like serious allegations. Anyone know if this site is reputable?
Qualcomm Is Cheating On Their Snapdragon X Elite/Pro Benchmarks
Qualcomm is cheating on the Snapdragon X Plus/Elite benchmarks given to OEMs and the press.semiaccurate.com
Sorry, you’re right! Brain fart. I should have just done it on paper first.Got it, that's really cool, thanks!
Shouldn't we use ratios rather than percentiles in the equation and its multiplication of f and V right? (btw how did you write half 1/2 as a symbol?)
not
Assuming constant C (which as you pointed it probably isn't) and where subscript 1 is the top tier chip and subscript 0 is the second tier chip:
P1/P0 = ½Cf1V1^2/½Cf0V0^2 = f1/f0 (V1/V0)^2
P1/P0 = 2
f1/f0 = 1.12
V1/V0 = Vdelta
Vdelta = sqrt(2/1.12) = 1.34 ... a 34% increase in voltage?
Right. The issue is that P=½CfV^2 holds, but V and f are not independent variables at the chip level. At a given voltage, there is a range of frequencies that works, but if you want to increase the frequency beyond that range, you need to increase V. So if you zoom in close on that curve I drew, it would be made up of lots of tiny f=2P/CV^2 sections, where different parts of the curve have different V’s. Because as you move to the right V has to get higher and higher, and you square it to get P, the curve flattens out toward an asymptote as you move to the right.
I’ve sort of lost track of all the benchmark numbers, so I’ll go by my understanding of what you just said. Increasing the clock 12% ((3.8-3.4)/3.4) would, ceteris paribus, cause a 12% increase in power dissipation. The “top tier” chip would be expected to have smaller C than the second tier chip - that’s one of the things that makes it bin faster. But if you assume C is the same, then the rest must be voltage.
Got it, that's really cool, thanks!
Shouldn't we use ratios rather than percentiles in the equation and its multiplication of f and V right? (btw how did you write half 1/2 as a symbol?)
not
Assuming constant C (which as you pointed it probably isn't) and where subscript 1 is the top tier chip and subscript 0 is the second tier chip:
P1/P0 = ½Cf1V1^2/½Cf0V0^2 = f1/f0 (V1/V0)^2
P1/P0 = 2
f1/f0 = 1.12
V1/V0 = Vdelta
Vdelta = sqrt(2/1.12) = 1.34 ... a 34% increase in voltage?
Yes. CV=q. Switching transistors requires moving charge (q).I have related question: in the above power equation, V is a function of f, but is it also a function of C? Do chips of a lower quality (higher C) also require more V for the same f?
What appears to be the issue here is physical design, not architectural or micro architectural design.I'm also struggling to accept my own analysis to some extent as it just seems so odd that Apple operates in much the same frequency domain for the M2 (~3.7GHz), in what at first glance appears to be a largely similar core design, on a very similar node (N5P vs N4).
a little bit of good information in this file. I don’t know the equivalent metrics for Apple’s chips, because I don’t memorize that sort of stuff anymorellvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-projectgithub.com
@Cmaier, can you please explain the important bits here?
Some say there is a 14-wide decoder.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.