What does Apple need to do to catch Nvidia?

I’d have to look into it. It’s been a while.

Edit: Yeah you need to use my hacked version. Bitwise is nice code but they don’t follow normal UNIX conventions. Mine does.

Ah, sorry,I misunderstood. I though the brew version is already your hacked version. Regardless, I was not able to build the tool, most likely autoconf is not detecting readline correctly and it was too much hassle for me to try and figure out what when wrong.

A question about "InstantAmperage", it's supposed to return the current in mamps, correct? On my machine the values reported are in the ballpark of 18446744073709550994, that's a bit high for a laptop current, right?
 
Ah, sorry,I misunderstood. I though the brew version is already your hacked version. Regardless, I was not able to build the tool, most likely autoconf is not detecting readline correctly and it was too much hassle for me to try and figure out what when wrong.

A question about "InstantAmperage", it's supposed to return the current in mamps, correct? On my machine the values reported are in the ballpark of 18446744073709550994, that's a bit high for a laptop current, right?
You installed that 14900K? Spicy!
 
Ah, sorry,I misunderstood. I though the brew version is already your hacked version. Regardless, I was not able to build the tool, most likely autoconf is not detecting readline correctly and it was too much hassle for me to try and figure out what when wrong.

A question about "InstantAmperage", it's supposed to return the current in mamps, correct? On my machine the values reported are in the ballpark of 18446744073709550994, that's a bit high for a laptop current, right?
If interpreted as a signed 64-bit integer, that's -622 mA, which is quite reasonable. It's normal for negative current to be a thing in laptop batteries - one direction of current flow for charging, the other for powering the computer, and which one is negative and which one positive is up to whoever designed the current measurement circuit.
 
If interpreted as a signed 64-bit integer, that's -622 mA, which is quite reasonable. It's normal for negative current to be a thing in laptop batteries - one direction of current flow for charging, the other for powering the computer, and which one is negative and which one positive is up to whoever designed the current measurement circuit.
Could you explain how one does that conversion?

18446744073709550994 (≈1.84 x 10^19) appears to be a base-10 representation of a value almost exactly twice the max for a signed 64-bit integer (+9223372036854775807 ≈ 9.2 x 10^18), and I'm wondering how you get –622 from that.
 
Last edited:
Could you explain how one does that conversion?

18446744073709550994 (≈1.84 x 10^19) appears to be a base-10 representation of a value almost exactly twice the max for a signed 64-bit integer (+9223372036854775807 ≈ 9.2 x 10^18), and I'm wondering how you get –622 from that.
2’s complement. Software should really do that for you though.
 
Could you explain how one does that conversion?

18446744073709550994 (≈1.84 x 10^19) appears to be a base-10 representation of a value almost exactly twice the max for a signed 64-bit integer (+9223372036854775807 ≈ 9.2 x 10^18), and I'm wondering how you get –622 from that.
What everyone else said, but in practical terms I googled "64-bit calculator", slammed the first hit, and it was a fairly crappy but adequate webpage programmer's calculator. You put it in unsigned mode, enter the decimal number, put it in signed mode, and you're done. Here's a link with the number already entered:


NOT(X)+1 is how to multiply X by -1 in 2's complement - play around with the calculator to see this in action.
 
Ah, sorry,I misunderstood. I though the brew version is already your hacked version. Regardless, I was not able to build the tool, most likely autoconf is not detecting readline correctly and it was too much hassle for me to try and figure out what when wrong.

A question about "InstantAmperage", it's supposed to return the current in mamps, correct? On my machine the values reported are in the ballpark of 18446744073709550994, that's a bit high for a laptop current, right?
Sorry, wasn't near a computer this weekend. The project wasn't really meant for public distribution so I don't have a README with build instructions so I've included them below. I can also provide a built executable of bitwise if you trust that it is valid.

As others have already replied, the amperage is a negative value that is 2's complement in mA. That's one of the reasons to need bitwise to do the 2's complement calculation.

Install Instructions

Install dependencies:

brew install automake
brew install autoconf
brew install readline

# note that these flags are for the Apple silicon install of homebrew
export LDFLAGS="-L/opt/homebrew/opt/readline/lib"
export CPPFLAGS="-I/opt/homebrew/opt/readline/include"

./bootstrap.sh
./configure
make
sudo make install # installs bitwise in /usr/local/bin

Edit to add:
You can use bitwise to do the 2's complement conversion to signed decimal

# echo 18446744073709550994 |bitwise -od -wd
# -622
 
Last edited:
Could you explain how one does that conversion?

Here is the historical logic of it:

Computers cannot subtract, they can only add. You could go to the trouble of including a sign with every value, and include elaborate logic to perform subtraction based on the sign, but that would be a serious PITA (especially where there is heterogenous data), so the early designers went with simple math.

To wit, if a + b = 0 then a = -b

If you look at 8-bit numbers, say FB and 05 add up to zero, so the signed value of FB is -5. The logical NOT of 05 is FA, which means the fastest way to get -a is NOT( a ) + 1. This is extremely handy, because it allows for extended precision, using the inverse of the carry bit as a borrow (the 6502 cheaped out here by using the straight carry bit, so the program first had to set C to do a proper subtraction).

Ultimately, there are no "signed" integers, only integers that you can evaluate as signed or as unsigned, as needs dictate. Traditionally, the highest-order bit of an integer has been treated as the sign value, and every byte from FF to 80 (extending all the way to 64-bit integers) is a negative value, which corresponds to its NOT + 1 positive value (though 80 is an odd duck, since its negative value is 80). Math operations usually use the high-order bit of the result to indicate a negative (or less-than) value when setting flags.

Floating point is a whole nother animal, which does have a dedicated sign bit, but the fundamental mechanism for subtraction works the same way.
 
Floating point is a whole nother animal, which does have a dedicated sign bit, but the fundamental mechanism for subtraction works the same way.
The fun consequence of this dedicated sign bit is that -0.0 is a legal IEEE 754 floating point value. Conforming implementations can and will generate it as the result of real computations. Ideally, your programming language's equality test should insert special checks to make sure that comparing -0.0 to +0.0 results in TRUE - it's not enough to make sure that the bit pattern of the FP data is the same.
 
I hope that as GPU capabilities develop and feature sets converge, cross-vendor development will become easier. The big issue of course is optimization. Apple and Nvidia already need different approaches to get best performance out of their respective hardware...
To that point, check out Chris Lattner’s (and crew’s) Modular Mojo talk. Good things are afoot. Also note that ASi was mentioned quite discreetly 🙂

Edit: There is still an important piece missing as shown by the last question, but that is not too hard to model and solve.
 
Last edited:
Sorry, wasn't near a computer this weekend. The project wasn't really meant for public distribution so I don't have a README with build instructions so I've included them below. I can also provide a built executable of bitwise if you trust that it is valid.

As others have already replied, the amperage is a negative value that is 2's complement in mA. That's one of the reasons to need bitwise to do the 2's complement calculation.

Install Instructions

Install dependencies:

brew install automake
brew install autoconf
brew install readline

# note that these flags are for the Apple silicon install of homebrew
export LDFLAGS="-L/opt/homebrew/opt/readline/lib"
export CPPFLAGS="-I/opt/homebrew/opt/readline/include"

./bootstrap.sh
./configure
make
sudo make install # installs bitwise in /usr/local/bin

Edit to add:
You can use bitwise to do the 2's complement conversion to signed decimal

# echo 18446744073709550994 |bitwise -od -wd
# -622

Many thanks, I missed the fact that one has to set the linker flags manually. Works now! I noticed that the IO registry update interval is rather slow. I'll look into reading the sensors directly when I have the time (need to deal with the day office job piling up first :) )
 
Frankly, I am not sure whether Nvidia's gamer and professional cores are even that different. They differ in memory subsystem, sure, and the professional stuff seems to have some extra features enabled. But at the core architecture level I think they are very similar.

It depends on which workstation GPU. Nvidia produces two currently: Ada and Hopper. Ada workstations share an identical architecture with Ada consumer GPUs. Hopper meanwhile is a pure compute/AI GPU and does not. Techpowerup incorrectly lists Hopper as a successor to Ada but in reality they are parallel lines. Hopper has 64-bit FP pipelines, no ray-tracing cores, and no encoding engines. While I suspect it is still similar in many respects, it is still an Nvidia GPU after all, it is a different architecture.

Nvidia does it to reduce costs, and there are marketing reasons too (gamers are easily impressed by bigger numbers). Apple's design approach is very different, and so they don't need to maintain two hardware lines.

I agree and indeed that was largely the point I was trying to make - Apple has a hybrid, general purpose approach that in some way mixes Nvidia’s capabilities together. The one downside (so far) is limited TFLOPS on Apple desktop computers which I think both of us would like to see them address somehow.

But still, to get the kinds of memory capacity as a Max/Ultra, you have to pay Nvidia thousands more than the entire cost of a Studio for just the GPU. And Apple’s memory bandwidth to compute ratio is pretty good for these parts especially compared to Nvidia’s consumer offerings - more on that in a bit as I would also argue that Nvidia may have over relied on the belief that gamers wouldn’t mind them cheapening out on memory capacity and bandwidth. As it turns out, gamers noticed this time.

Interestingly, VRAM size has become a hot-button topic among PC gamers. The articles linked below report that, at the higher-quality settings, some modern games require more than 8 GB—thus you can run into stuttering with cards like the 3070, which is relatively strong computationally, but has only 8 GB VRAM.
with the 4000 series Nvidia took a lot of flak for their low base VRAM and slow VRAM increases for product tiers (as well as low memory bandwidth on some of those tiers and overall price increases). So this is very definitely an issue in the PC space.

As was predicted when the 4000 series first came out and the backlash it received got going (like the 2000 series). Nvidia released a SUPER lineup to mitigate some of these problems. The 4080 SUPER got a small computational spec bump and a big price reduction relative to the 4080 while the SUPER 4070 Ti and 4070 got memory capacity, bandwidth, and computational spec bumps but stayed at the same price as their original models. The original 4070 will remain for sale at a slightly reduced price. Still though, these are relatively small changes compared to say … adopting UMA, and given that I’d still say memory capacity/bandwidth isn’t necessarily something that Apple needs to improve to catch up to Nvidia - though I will never say no to more especially on the low end but I would describe that more as amplifying Apple’s advantages rather than catching up. Perhaps I’m being overly pedantic but I would argue it’s an important distinction given the thread subject.
 
Last edited:
I’m amused that MR has its own thread on this general topic though instead it’s framed as “Apple is behind so they should just give up and use Nvidia GPUs instead”. 🤷‍♂️

However there are actually some good discussions going on there (@leman, @theorist9, @diamond.g, @Xiao_Xi and a few others managed to steer it in a more productive direction) - mostly what we’ve already discussed here but a couple of other interesting tangents.

One thing I wouldn’t mind is if Apple added the ability to do PCIe (+ thunderbolt) pass through for VMs. We could hook up an eGPU or on the Mac Pro dGPU and use it directly in VMs. Linux has that. This may not be a priority to Apple or even against their interests I don’t know, but I think it would be a very useful tool.

One of the better tangents in that earlier thread was why is AMD so far behind Nvidia? In a word: software (and a bit hardware too). AMD simply lacks the ability to develop and maintain all the drivers and software for even its own product stack. In contrast, Nvidia’s software division is massive. Again the joke from Ryan Smith repeated here: Nvidia is a software company masquerading as a hardware company.

To clarify a couple of points @theorist9 and @leman I haven’t delved too deeply into it but from what I can see ZLUDA is built on top of HIP/ROCm where HIP was meant to be nearly identical to CUDA, through ZLUDA the GPU is just running CUDA. But it’s not an entirely separate effort from HIP. Previously ZLUDA was built on top of oneAPI so it probably has some modularity. Unclear if AMD has hardware parity with Nvidia, I don’t think they do. Rather it’s just that just like earlier Nvidia processors might be supported by CUDA12 but not be able to support every feature in it, the same is probably true for AMD processors. These efforts do not represent any agreement between AMD and Nvidia as far as I am aware, rather it’s just AMD (and previously Intel) reverse engineering CUDA.
 
Last edited:
I’m amused that MR has its own thread on this general topic though instead it’s framed as “Apple is behind so they should just give up and use Nvidia GPUs instead”. 🤷‍♂️

However there are actually some good discussions going on there (@leman, @theorist9, @diamond.g and a few others managed to steer it in a more productive direction) - mostly what we’ve already discussed here but a couple of other interesting tangents.

One thing I wouldn’t mind is if Apple added the ability to do PCIe (+ thunderbolt) pass through for VMs. We could hook up an eGPU or on the Mac Pro dGPU and use it directly in VMs. Linux has that. This may not be a priority to Apple or even against their interests I don’t know, but I think it would be a very useful tool.

One of the better tangents in that earlier thread was why is AMD so far behind Nvidia? In a word: software (and a bit hardware too). AMD simply lacks the ability to develop and maintain all the drivers and software for even its own product stack. In contrast, Nvidia’s software division is massive. Again the joke repeated here:Nvidia is a software company masquerading as a hardware company.

To clarify a couple of points @theorist9 and @leman I haven’t delved too deeply into it but from what I can see ZLUDA is built on top of HIP/ROCm where HIP was meant to be nearly identical to CUDA, ZLUDA is just CUDA. Unclear if AMD has hardware parity with Nvidia, I don’t think they do. Rather it’s just that just like earlier Nvidia processors might be supported by CUDA12 but not be able to support every feature in it, same is probably true for AMD processors.
I‘ve been reading that thread too. The usual suspects are still determined to communicate the futility of Apple competing
 
To that point, check out Chris Lattner’s (and crew’s) Modular Mojo talk. Good things are afoot. Also note that ASi was mentioned quite discreetly 🙂

Edit: There is still an important piece missing as shown by the last question, but that is not too hard to model and solve.
While I think it looks really neat and promising, reading their website I can’t help but think of the XKCD cartoon:


1709237014541.png


🙃
 
I‘ve been reading that thread too. The usual suspects are still determined to communicate the futility of Apple competing
They do have a point. I mean it took Apple 5 years to add Mesh shading to their GPUS after Nvidia did in 2018. Sure, they have RT and mesh shaders now. Is there any software that can utilise both?

I know games using both are very rare, only one so far, Alan Wake 2 so far. Apple not only needs to keep more regularly but also needs to improve their software catalog. To showcase GPU improvements to the masses games are the best way to do it.

Let’s make it easier have their been any games released since Apple released M3 with Ray tracing support?
 
Last edited:
They do have a point. I mean it took Apple 5 years to add Mesh shading to their GPUS after Nvidia did in 2018. Sure, they have RT and mesh shaders now. Is there any software that can utilise both?
I don’t believe that’s their point though. Their point is that Apple is wrong and bad if any single component by any other manufacturer is faster than Apple‘s solution, no matter the efficiency or the cost. I think there are people on there saying Apple has lost because they don’t competr with datacenter solutions like the H100.
I know games using both are very rare, only one so far, Alan Wake 2 so far.
If that’s true, then it’s ridiculous to complain about Apple lacking software that uses both.
Apple not only needs to keep more regularly but also needs to improve their software catalog. To showcase GPU improvements to the masses games are the best way to do it.

Let’s make it easier have their been any games released since Apple released M3 with Ray tracing support?
Why just games? Why would you exclude Blender or Redshift? Given the short time it’s been available, I have no idea why you would think it’s bad, and no idea why you would think games would be the first software to support it.
 
I don’t believe that’s their point though. Their point is that Apple is wrong and bad if any single component by any other manufacturer is faster than Apple‘s solution, no matter the efficiency or the cost. I think there are people on there saying Apple has lost because they don’t compete with datacenter solutions like the H100.
Indeed and if earnings reports and market caps were the judge, no one should use anything other than Apple products in any market Apple competes in.

If that’s true, then it’s ridiculous to complain about Apple lacking software that uses both.
I think he was saying that was the only Apple software using both that he knew of, but I'm not sure.
Why just games? Why would you exclude Blender or Redshift? Given the short time it’s been available, I have no idea why you would think it’s bad, and no idea why you would think games would be the first software to support it.
And like you I agree that the above point is largely immaterial. While Apple is definitely trying to get native games onto its platform, the M3 has only been out just a few months and Apple's bread and butter has been professional applications which have shown pretty reasonable, shockingly speedy in fact, adoption of new features from Apple so far. I don't think anyone here would disagree with the @exoticspice1 's sentiment that Apple needs to continue to improve at regular intervals, but giving up just as they are making substantial progress is silly. As someone who uses Nvidia's CUDA, I'm excited by Apple's entrance and there are definitely ways in which Nvidia can be pushed to compete in user laptop and even desktop by Apple. As someone who also uses Apple products, sure I miss the days when I could use both in the same product, but times change (though again PCIe passthrough!). Finally, I wouldn't want to see either company with a monopoly regardless of how much better/faster/ etc ... they are.
 
Last edited:
Back
Top