3DMark’s new benchmark: Steel Nomad and Steel Nomad Light.

What’s Apple silicon GPUs like for FP64? Is it 1:2 compared to fp32? Or closer to GeForce’ 1:32?
The latter. FP64 on AS GPUs it’s software only no acceleration so you have to use a library and my memory is that you can get roughly 1:32 acceleration (at the cost of code size).

I was under the impression UL writes the bench to an API, and don't include vendor specific paths.


I wonder why I cannot find any vulkan scores for the 5060ti in Steel Nomad (Light as well).

The notebookcheck link above reported a Vulkan score, it was a couple of hundred points higher than the DX12 score.
 
The latter. FP64 on AS GPUs it’s software only no acceleration so you have to use a library and my memory is that you can get roughly 1:32 acceleration (at the cost of code size).



The notebookcheck link above reported a Vulkan score, it was a couple of hundred points higher than the DX12 score.
Philip Turner has a project for software fp64 here, with some scores.
 
The latter. FP64 on AS GPUs it’s software only no acceleration so you have to use a library and my memory is that you can get roughly 1:32 acceleration (at the cost of code size).



The notebookcheck link above reported a Vulkan score, it was a couple of hundred points higher than the DX12 score.
I saw, but I couldn't find any scores on the results browser when I looked. So I assume they didn't upload them, but it isn't clear if that is because the scores were not considered valid or if they just didn't upload them. The scores for all the other cards seem to be far below the average for the GPU type (class?) as well. Supposedly CPU doesn't matter for Steel Nomad (or Speedway) so the scores are just straight FPS times 100. Steel Nomad light score is FPS times 135 (I think), dunno why it isn't 100.
 
I am sure that’s true. But even writing general DirectX or Metal or whatever code there’s still usage patterns that are more or less friendly to a given micro architecture
If they are not doing the same work wouldn't that defeat the purpose of having the scores be comparable?
 
If they are not doing the same work wouldn't that defeat the purpose of having the scores be comparable?
How do you define the same work?
Arguably merge sort and quick sort do the same work.

But in a less extreme way; you can have two implementations of the exact same algorithm tuned to different expectations about cache capacities, thread counts, and ALU configurations.

We can make a benchmark that requires 30GB of VRAM to run optimally and while you could, for all intents and purposes, code it the same for metal and directx, this would definitely make nvidia look worse than apple for most product tiers.

In a less extreme case, a task can be better suited to more vector multiplications than integer additions or be less suited to tile memory or whatever with any given implementation of doing effectively the same thing.

That doesn’t mean that the benchmark is per se written to optimize more for one thing than another but the benchmark in particular may still be better suited to one architecture than another and you could potentially achieve pixel by pixel equivalence with a different pipeline that would run better on a different architecture. Doesn’t invalidate the benchmark but it’s an argument for having multiple benchmarks. A lot of real software all there is also going to be written more with the architectural expectations of nvidia than apple. If nothing else because a lot of the experienced graphics programmers have more experience with that architecture.
 
Philip Turner has a project for software fp64 here, with some scores.
Aye that's the one. Unfortunately not being worked on anymore (not sure what was left undone), though maybe someone else could finish it if basic things like division are indeed still undone. However I find the note at the end of his first GPT-4 conversation illuminating, particularly the final sentence:

Update: I made a working proof-of-concept for eFP64, using a different approach. It is not IEEE-compliant but solves the same problems. However, I realized that my particular computational nanotech use case can probably get away with 100% FP32. And quantum chemistry can get away with the massive latency of switching to CPU + AMX for the FP64 part.

One advantage of unified memory is that you can use the CPU+AMX to do FP64 and depending on your use case that can just work and still be performant. I very much doubt that would be the case in a dGPU setup, although you never know - at least I suspect it is true a lot more often when one has an SOC with everyone sharing the same unified memory.

I saw, but I couldn't find any scores on the results browser when I looked. So I assume they didn't upload them, but it isn't clear if that is because the scores were not considered valid or if they just didn't upload them. The scores for all the other cards seem to be far below the average for the GPU type (class?) as well.
Really? I just checked right now and they seemed reasonable for Steel Nomad. One thing to keep in mind is that while some of the devices in review list will be overclocked, many more of the GPUs (and their paired CPUs) in the 3D Mark database will be and by higher than review units even of 3D party GPUs. If there's a big discrepancy, check the memory and core clocks of the GPU in the benchmark list against the stock configuration or the published review for Notebookcheck to see if there is a big difference. Could you give me an example of one where you found the discrepancy too much?

I assume the reviewer simply didn't upload their scores. But in general, when you compare the Vulkan and DX SN scores, the Vulkan is higher. Unclear whether that is a result of something intrinsic to the two APIs or simply UL's programming in this particular benchmark.

Supposedly CPU doesn't matter for Steel Nomad (or Speedway) so the scores are just straight FPS times 100. Steel Nomad light score is FPS times 135 (I think), dunno why it isn't 100.
 
Aye that's the one. Unfortunately not being worked on anymore (not sure what was left undone), though maybe someone else could finish it if basic things like division are indeed still undone. However I find the note at the end of his first GPT-4 conversation illuminating, particularly the final sentence:



One advantage of unified memory is that you can use the CPU+AMX to do FP64 and depending on your use case that can just work and still be performant. I very much doubt that would be the case in a dGPU setup, although you never know - at least I suspect it is true a lot more often when one has an SOC with everyone sharing the same unified memory.


Really? I just checked right now and they seemed reasonable for Steel Nomad. One thing to keep in mind is that while some of the devices in review list will be overclocked, many more of the GPUs (and their paired CPUs) in the 3D Mark database will be and by higher than review units even of 3D party GPUs. If there's a big discrepancy, check the memory and core clocks of the GPU in the benchmark list against the stock configuration or the published review for Notebookcheck to see if there is a big difference. Could you give me an example of one where you found the discrepancy too much?

I assume the reviewer simply didn't upload their scores. But in general, when you compare the Vulkan and DX SN scores, the Vulkan is higher. Unclear whether that is a result of something intrinsic to the two APIs or simply UL's programming in this particular benchmark.
So 3DMark doesn't show the vendor of the card in the main view. I was looking at the overall average scores. Right now the average is actually lower than what they got for 5060Ti. Below is the closest core clock, and the DX12 score is lower than they got. I am assuming it is a driver difference.
1744971354942.png


There are only 119 results in the db right now, so it will take some time for the average to balance out. Their AMD card results are actually pretty close to the average.


Side note, it is a shame you cannot see any of the mobile chip results for steel nomad light in the browser. makes comparisons harder than they need to be. Wonder if they will at least add macOS to the browser.
 
So 3DMark doesn't show the vendor of the card in the main view. I was looking at the overall average scores. Right now the average is actually lower than what they got for 5060Ti. Below is the closest core clock, and the DX12 score is lower than they got. I am assuming it is a driver difference.
View attachment 34663

There are only 119 results in the db right now, so it will take some time for the average to balance out. Their AMD card results are actually pretty close to the average.


Side note, it is a shame you cannot see any of the mobile chip results for steel nomad light in the browser. makes comparisons harder than they need to be. Wonder if they will at least add macOS to the browser.
You should be able to see mobile chips though I don’t know about the newest ones yet. But yes it’s a pity Mac results aren’t available (Android aren’t available either). I don’t know why they don’t show them for those tests that have Metal versions (or Android versions which I presume are Vulkan). Maybe it’s a free/paid thing? I haven’t really dug into their business model.

EDIT: Sorry by mobile you probably meant phone not laptop, the reuse of that term always causes (me) confusion. PC laptop chips are on the browser but not phone or Mac.
 
Last edited:
You should be able to see mobile chips though I don’t know about the newest ones yet. But yes it’s a pity Mac results aren’t available (Android aren’t available either). I don’t know why they don’t show them for those tests that have Metal versions (or Android versions which I presume are Vulkan). Maybe it’s a free/paid thing? I haven’t really dug into their business model.

EDIT: Sorry by mobile you probably meant phone not laptop, the reuse of that term always causes (me) confusion. PC laptop chips are on the browser but not phone or Mac.
I have my criticisms of Geekbench, but it’s hard to argue against their site and database being the easiest to use.
 
You should be able to see mobile chips though I don’t know about the newest ones yet. But yes it’s a pity Mac results aren’t available (Android aren’t available either). I don’t know why they don’t show them for those tests that have Metal versions (or Android versions which I presume are Vulkan). Maybe it’s a free/paid thing? I haven’t really dug into their business model.

EDIT: Sorry by mobile you probably meant phone not laptop, the reuse of that term always causes (me) confusion. PC laptop chips are on the browser but not phone or Mac.
Actually I forgot the mobile website is here:

I have my criticisms of Geekbench, but it’s hard to argue against their site and database being the easiest to use.
It depends. The actual search UI for 3D Mark is easier (Geekbench gets confused by device and components names really easily) and much more powerful (can search for combinations of various parts and even clocks - that's way more powerful than Geekbench and basically any other benchmark site I've found), but that they split the mobile and PC scores into different websites and seemingly have no way to find Mac scores at all is worse (and the 3D Mark mobile search is not as good as the PC one). So for Windows PCs the 3D Mark browser is the best benchmarking browser I've seen, but for everyone else ...
 
Last edited:
Actually I forgot the mobile website is here:


It depends. The actual search UI for 3D Mark is easier (Geekbench gets confused by device and components names really easily) and much more powerful (can search for combinations of various parts and even clocks - that's way more powerful than Geekbench and basically any other benchmark site I've found), but that they split the mobile and PC scores into different websites and seemingly have no way to find Mac scores at all is worse (and the 3D Mark mobile search is not as good as the PC one). So for Windows PCs the 3D Mark browser is the best benchmarking browser I've seen, but for everyone else ...
Must be just me then. I was never able to find what I was looking for on there.
 

New Nvidia drivers boosting benchmark performance but causing game instability.
It appears overclocks that are not factory are unstable because of the temp sensor bug. Boost clocks are going higher than they should causing some games to crash.
 
Back
Top