Intel finally releasing discrete GPU

Cmaier · Sep 18, 2022

Colstan said:
Thinking about this further, both @Yoused and @Cmaier have explained to me why Apple may see value in implementing SMT with their E-cores, but it makes little sense with the P-cores. From what folks have said here, x86 by nature, can benefit from SMT much more than RISC ISAs. (I won't rehash that discussion here, it's buried somewhere in the x86 vs. Arm thread.) In the back of my mind, I did find it curious that IBM found value in SMT with POWER, implementing 8-way SMT, as you point out.

After having rummaged through @Cmaier's brain about the future of the M-series, and where Apple may take the Mac in the next half-decade, it does make me wonder if there is a scenario in which it makes sense for Apple to implement SMT in both the P-cores and E-cores? (Or even "Middle cores" if such a thing ever materializes, if that scenario even makes logical sense?) As has also been pointed out, there are only so many ways to increase IPC, and Apple is going to need to get creative to find ways to do so. This is entirely speculative, as I said in my original question, but are there changes to Apple Silicon that Apple could implement that would then make it so that some form of SMT makes sense? Apparently it doesn't right now, but the M-series is going to look much different in a half-decade than it does today, in whatever form it takes. (Of course, this has absolutely nothing to do with me needing a new Mac around that time period, total coincidence.) Any thoughts from knowledgable folks here would be most welcome.

Anything’s possible but I Imagine the meeting went like this:

“Let’s go all out, not worry about power or die area, and maximize MP performance!”

Response: “ok. Let’s double the number of cores and not have to worry about implementation bugs or side channel attacks.”

Yoused · Sep 21, 2022

Colstan said:
… it does make me wonder if there is a scenario in which it makes sense for Apple to implement SMT in both the P-cores and E-cores?

P cores are like a postcard compared to E cores like their postage stamp. It is much easier to just add more E cores than to try to split them with SMT, because they are so small. Anthill still does it with the P cores because one core is so big that splitting its pipe gives you two cores with only a sight die space increase (a two-thread core is much smaller than two single-thread cores).

Apple's P cores are so absurdly OoOE that adding SMT would be more of an effort than the gain would offer (and might tangle up with some POWER patents). Instead, Apple is improving E core performance by a lot. What I could see down the road for most of M-series would be more of E cores handling the threads with some kind of mechanism that would allow them to recruit HPC circuitry for high-load (mainly SIMD) work. Kind of like having coprocessor semicores.

Because Apple is moving toward more heterogenous computing, with frequent use jobs handled by dedicated units that are speed and energy efficient. This is kind of like unfolding the '70s microcode ethos into satellite logic, which is facilitated by having those jobs obfuscated by the OS.

Colstan · Sep 29, 2022

Perhaps too little, too late, but Intel is officially releasing Arc graphics cards on Oct. 12th. That just so happens to be the exact same day as Nvidia's 4000-series becomes available. So, perhaps they are trying to show how much value they are able to offer compared to Nvidia's space heaters, or more cynically, hiding the launch at a time when Nvidia is going to get all of the press attention, thus sparing them some measure of embarrassment. Regardless, this is way too late, and they could have done well during the GPU craze with pandemic buying and crypto insanity, but that's over. I just got ordered a 6900XT for $659, and these things were going for $1,600 a year ago. I guess it's something, but I don't see this making an impact, other than being a novelty product. It's kinda like owning an i740, a project which Gelsinger was in charge of, assuming history repeats itself.

exoticspice1 · Sep 29, 2022

Yoused said:
P cores are like a postcard compared to E cores like their postage stamp. It is much easier to just add more E cores than to try to split them with SMT, because they are so small. Anthill still does it with the P cores because one core is so big that splitting its pipe gives you two cores with only a sight die space increase (a two-thread core is much smaller than two single-thread cores).

Apple's P cores are so absurdly OoOE that adding SMT would be more of an effort than the gain would offer (and might tangle up with some POWER patents). Instead, Apple is improving E core performance by a lot. What I could see down the road for most of M-series would be more of E cores handling the threads with some kind of mechanism that would allow them to recruit HPC circuitry for high-load (mainly SIMD) work. Kind of like having coprocessor semicores.

Because Apple is moving toward more heterogenous computing, with frequent use jobs handled by dedicated units that are speed and energy efficient. This is kind of like unfolding the '70s microcode ethos into satellite logic, which is facilitated by having those jobs obfuscated by the OS.

a but off topic but do you see shift in Apple's GPU late next year with M3 if they add hard raytracing hardware.

throAU · Sep 30, 2022

Nvidia/Amd will just drop price and kill this.

Yoused · Sep 30, 2022

exoticspice1 said:
a but off topic but do you see shift in Apple's GPU late next year with M3 if they add hard raytracing hardware.

I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.

Colstan · Sep 30, 2022

Yoused said:
I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.

On a conceptual level, I know what ray tracing is, and what its benefits are. It's hard not to because Nvidia won't shut up about it, even though it often makes games look worse. However, I'm unfamiliar with path tracing. Could you elaborate?

Yoused · Sep 30, 2022

Colstan said:
On a conceptual level, I know what ray tracing is, and what its benefits are. It's hard not to because Nvidia won't shut up about it, even though it often makes games look worse. However, I'm unfamiliar with path tracing. Could you elaborate?

Path tracing is like the reverse of ray tracing, following light paths to the camera rather than following the viewpoint to the light sources. Just as it is, the process takes a lot longer and has some weaknesses, but it can have the potential to yield more photo-realistic results. And there are ways to make it faster and to alleviate some of the weaknesses. Hardware acceleration combined with a tailored approach managed by ML logic could make some form of it truly practical.

Nycturne · Sep 30, 2022

I’m not sure that’s the case.

Path tracing can be done in either direction. The key difference as I understand it is the sampling method of the rays, and how they are cast through a scene. Ray tracing is more limited in how rays are cast through a scene, while with path tracing. rays are cast through multiple bounces until they hit the other “side” of the scene (a light or screen) or exhaust the number of bounces.

The other big thing is that path tracing will fire many more rays. When casting from the camera, each pixel will have multiple rays fired out, each taking a different path through the scene and then averaging the results. Certain techniques can cut down on the number of rays needed, but the technique relies on random sampling of multiple rays to get good results.

But ray tracing as I understand is not setup to be able to do any forward tracing (starting from the light), so that is something that path tracing can do which ray tracing cannot, but you can fire the rays from the camera in both cases and still get more realistic results from path tracing.

You can do path tracing on ray tracing accelerators because of the similarities in the techniques (Microsoft and Nvidia have been doing it on RTX and DXR-compatible cards), but you take a performance hit due to rays with longer lifetimes and larger numbers of rays being cast.

Path tracing will become more common for sure though. Techniques used already that are ML based (denoiser and DLSS in particular) can help in both ray tracing and path tracing to improve the result.

diamond.g · Oct 1, 2022

Nycturne said:
I’m not sure that’s the case.

Path tracing can be done in either direction. The key difference as I understand it is the sampling method of the rays, and how they are cast through a scene. Ray tracing is more limited in how rays are cast through a scene, while with path tracing. rays are cast through multiple bounces until they hit the other “side” of the scene (a light or screen) or exhaust the number of bounces.

The other big thing is that path tracing will fire many more rays. When casting from the camera, each pixel will have multiple rays fired out, each taking a different path through the scene and then averaging the results. Certain techniques can cut down on the number of rays needed, but the technique relies on random sampling of multiple rays to get good results.

But ray tracing as I understand is not setup to be able to do any forward tracing (starting from the light), so that is something that path tracing can do which ray tracing cannot, but you can fire the rays from the camera in both cases and still get more realistic results from path tracing.

You can do path tracing on ray tracing accelerators because of the similarities in the techniques (Microsoft and Nvidia have been doing it on RTX and DXR-compatible cards), but you take a performance hit due to rays with longer lifetimes and larger numbers of rays being cast.

Path tracing will become more common for sure though. Techniques used already that are ML based (denoiser and DLSS in particular) can help in both ray tracing and path tracing to improve the result.

Quake 2 RT and Minecraft (bedrock) are path traced. Denoising is used to hide low ray count (and/or low bounce count).

Yoused · Oct 1, 2022

I believe the noise problem could be greatly mitigated by aggressive path-weighting, using analytic ML-type logic to assess the overall complexity of path regions so that less complex regions could receive differential gradation, requiring less work overall, and the tracer could concentrate its efforts on the more complex regions. This could improve early-pass performance and efficiency for a large fraction of rendering jobs and is not really very far out of reach of contemporary tech.

Nycturne · Oct 2, 2022

diamond.g said:
Quake 2 RT and Minecraft (bedrock) are path traced. Denoising is used to hide low ray count (and/or low bounce count).

I thought that was the case, but I wasn’t certain, so didn’t want to make any claims. I just knew that the hardware was already being used for it in at least a “this is how you do it” capacity.

leman · Oct 3, 2022

Yoused said:
I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.

RT APIs are just about casting rays and specifying geometry that these rays might intersect. How exactly you use that is up to you. Some folks use RT APIs to do scene node culling or collision detection and that's fair too.

My understanding of these things is fairly limited but it seams to me that RT acceleration (of any kind) is really about compacting and reordering work. Graphics is a massively parallel task where you apply exactly the same sequence of operations to multiple objects (originally vertices or pixels). That's why GPUs are set up as massive SIMD machines with execution width of 32 or more. But RT is divergent in its nature. Rays tend to scatter in different directions and often hit different objects. This not only means memory divergence (which already kills GPU performance) but also execution divergence as different objects need to invoke different shaders (which is why Metal now has function pointers and recursive GPU calls). To get the performance back on track one somehow needs to compact the work so that it becomes local again. Of course, using additional information (like the ML heuristics you describe) is also valuable, but that's more like a bonus.

I tried to get some info on how Nvidia's RT acceleration works but there is a surprising lack of concrete information. There was a paper that claimed that the RT acceleration is closely integrated with the texturing unit and relies on reordering memory accesses to improve work locality. Which would explain why Nvidia's RT is so performant. AMD's RT is simpler — it's just a bandaid that uses fixed-function intersection instructions, which helps to speed up computations but does nothing for the divergence problem, which is also a reason why AMD's implementation is so much slower in practice.

There is a faint hope that Apple is late with its hardware RT because they aim to provide a comprehensive general solution for programmable work compacting. A feature like that would be huge deal. I wouldn't even know how to approach such a problem though...

leman · Oct 9, 2022

Since there is interest in the topic, I'll leave here the links to some raytracing-related patents filed by Apple. There might be more, but this is all I was able to find. I find patents very difficult to digest but so far the impression I have that Apple is focusing on some area and energy-efficient approaches to RT which appear to be fairly novel. Their patents describe hardware that does geometry traversal using low-precision calculations and then delegates the final checks to the shaders.

US20220036630A1 - SIMD Group Formation Techniques during Ray Intersection Traversal - Google Patents
US20220207690A1 - Primitive Testing for Ray Intersection at Multiple Precisions - Google Patents
CN114092614A - Ray intersection circuit with parallel ray testing - Google Patents

dada_dave · Mar 16, 2023

So … ASRock’s ARC commercial. Yikes

https://mobile.Twitter or X not allowed/IanCutress/status/1636288916369317890

Renzatic · Mar 17, 2023

dada_dave said:
So … ASRock’s ARC commercial. Yikes

https://mobile.Twitter or X not allowed/IanCutress/status/1636288916369317890

Aw, it's been removed. I'd love to know what was so terrible about it.

Yoused · Mar 17, 2023

Renzatic said:
Aw, it's been removed. I'd love to know what was so terrible about it.

It compared a guy reacting to a pretty lass to the same guy reacting to their video card in much the same way only moreso.

Renzatic · Mar 18, 2023

Yoused said:
It compared a guy reacting to a pretty lass to the same guy reacting to their video card in much the same way only moreso.

Sounds hilariously awkward. Shame I missed it.

Yoused · Mar 18, 2023

Renzatic said:
Sounds hilariously awkward. Shame I missed it.

Not really, though. They showed a guy reacting to a female, showing much interest in her. Then it was like a full restart, same guy, but the GPU card, no girl. At the end, he was nuzzling the card with his cheek (in the first part, he never so much as touched the female).

Intel finally releasing discrete GPU

Cmaier

Site Master

Yoused

up

Colstan

Site Champ

exoticspice1

Site Champ

throAU

Site Champ

Yoused

up

Colstan

Site Champ

Yoused

up

Nycturne

Elite Member

diamond.g

Elite Member

Yoused

up

Nycturne

Elite Member

leman

Elite Member

leman

Elite Member

dada_dave

Elite Member

Renzatic

Egg Nog King of the Eastern Seaboard

Yoused

up

Renzatic

Egg Nog King of the Eastern Seaboard

Yoused

up