# Intel finally releasing discrete GPU



## Cmaier

Intel’s first Arc GPUs are launching today for laptops
					

More powerful Arc GPUs will arrive this summer.




					www.theverge.com
				




This is very very late in the game, and is something that Intel has been working on for a very long time.  Finally releasing these now is thinking small.  Yeah, Intel will likely make enough money on this to recoup the 15 years of R&D that they’ve put into it, but this is not very forward-looking.  What Intel should have learned from apple is that the future is likely to be integrated CPU/GPU packages, probably with a unified memory architecture.  Discrete GPUs are already somewhat niche, and that’s going to be even more the case in 5 years.


----------



## Citysnaps

Seems Intel should have pivoted back when they learned what Apple was doing.


----------



## Yoused

One of the stories that links off of that story, from a year ago, says that the Intel Arc cards will not work with an AMD based system. And I can understand that Intel wants to protect/promote their own brand, but unless Arc is stunningly better than the alternatives, they seem to be doing themselves a bit of a disservice with such a strategery.


----------



## Cmaier

Yoused said:


> One of the stories that links off of that story, from a year ago, says that the Intel Arc cards will not work with an AMD based system. And I can understand that Intel wants to protect/promote their own brand, but unless Arc is stunningly better than the alternatives, they seem to be doing themselves a bit of a disservice with such a strategery.




Seems confusing how that could even be the case.


----------



## Yoused

Cmaier said:


> Seems confusing how that could even be the case.



here:


> “The Iris Xe discrete add-in card will be paired with 9th gen (Coffee Lake-S) and 10th gen (Comet Lake-S) Intel® Core™ desktop processors and Intel(R) B460, H410, B365, and H310C chipset-based motherboards and sold as part of pre-built systems,” says an Intel spokesperson in a statement to Legit Reviews. “These motherboards require a special BIOS that supports Intel Iris Xe, so the cards won’t be compatible with other systems.”



which may be different with this year's models


----------



## Cmaier

Yoused said:


> here:
> 
> which may be different with this year's models



I wonder what the bios is doing.


----------



## Yoused

Cmaier said:


> I wonder what the bios is doing.



someone else would better know

_Intel – making my head hurt since at least 1976_


----------



## Colstan

Cmaier said:


> Intel will likely make enough money on this to recoup the 15 years of R&D that they’ve put into it, but this is not very forward-looking.



If we really want to reach back in history, you could say they've been waiting much longer, since the i740 was released 24 years ago.

Listening to the more reliable leakers on the PC side, the problem seems to be less with the hardware and more due to immature drivers. I'm sure Intel would have liked to have launched much sooner, but one of the reasons why the performance desktop parts are being held back is because game performance and stability just isn't there yet. Intel is only getting one shot at making a good impression, and they don't want a repeat of what Matrox did with the Parhelia, which they never recovered from.

Regardless of what caused the delay, the timing is terrible. Graphics cards are just now coming down in price, so the window to take advantage of shortages to gain market share is closing. The performance is likely to be, at best, between a 3060 and 3070 at the top end. By the time Intel does launch discrete cards, Nvidia Lovelace and AMD RDNA3 will be close to release, not to mention refreshes of the current lines.

I appreciate Intel not wanting to rush product to market, but it seems that they are playing catch-up, as is tradition.



Cmaier said:


> What Intel should have learned from apple is that the future is likely to be integrated CPU/GPU packages, probably with a unified memory architecture.  Discrete GPUs are already somewhat niche, and that’s going to be even more the case in 5 years.



I was skeptical that we'd see performance GPUs inside the M1 series. We have spent years being conditioned that integrated graphics = bad. A lot of folks just refused to believe that Apple would dump Intel, forget about leaving AMD and having desktop level GPU performance on the same package. When asked about it, I remember Lisa Su meekly stating, "We are the graphics partner of Apple", which is technically still true with add-in cards for the Mac Pro.

The latest 3090 Ti is pushing 450w, and Nvidia allegedly has plans to go up to 600w with Lovelace, with some specialty cards hitting 800w. That's simply unsustainable and something is going to have to give. Maybe there will always be standalone GPUs, like you can still get a sound card, but everything is becoming more integrated, not less.


----------



## diamond.g

Yoused said:


> One of the stories that links off of that story, from a year ago, says that the Intel Arc cards will not work with an AMD based system. And I can understand that Intel wants to protect/promote their own brand, but unless Arc is stunningly better than the alternatives, they seem to be doing themselves a bit of a disservice with such a strategery.



That was for DG1 (which was never released to the general public). DG2 (Desktop) won't have that limitation. I'm still skeptical about the drivers, but I guess we will see how that goes.


----------



## diamond.g

So far off to a meh start...


----------



## Cmaier

diamond.g said:


> So far off to a meh start...



Commenters say this is good


----------



## diamond.g

Cmaier said:


> Commenters say this is good



I didn't even look at the comments. But now that I have they did, until pricing came into the picture, lol. 

I am not sure how I feel about the results given this is the top of the bottom end. I have to keep reminding myself that Alchemist is only supposed to compete against the "Enthusiast" tier at best. We wont see anything from Intel that can touch the 3080 series cards until either Battlemage or Celestial.


----------



## Cmaier

I think it‘s “fine,” but I don’t get why they want to compete in the market at the level of “fine.”


----------



## diamond.g

Cmaier said:


> I think it‘s “fine,” but I don’t get why they want to compete in the market at the level of “fine.”



Captive audience. Intel can make pricing deals to OEMs (like they did for CPU's) so that they choose Intels dGPU's instead of AMD or Nvidia.

And for most people there won't be any difference cause most people don't play games, and for those that do it is close enough to get the job done, I guess.


----------



## Huntn

Colstan said:


> If we really want to reach back in history, you could say they've been waiting much longer, since the i740 was released 24 years ago.
> 
> Listening to the more reliable leakers on the PC side, the problem seems to be less with the hardware and more due to immature drivers. I'm sure Intel would have liked to have launched much sooner, but one of the reasons why the performance desktop parts are being held back is because game performance and stability just isn't there yet. Intel is only getting one shot at making a good impression, and they don't want a repeat of what Matrox did with the Parhelia, which they never recovered from.
> 
> Regardless of what caused the delay, the timing is terrible. Graphics cards are just now coming down in price, so the window to take advantage of shortages to gain market share is closing. The performance is likely to be, at best, between a 3060 and 3070 at the top end. By the time Intel does launch discrete cards, Nvidia Lovelace and AMD RDNA3 will be close to release, not to mention refreshes of the current lines.
> 
> I appreciate Intel not wanting to rush product to market, but it seems that they are playing catch-up, as is tradition.
> 
> 
> I was skeptical that we'd see performance GPUs inside the M1 series. We have spent years being conditioned that integrated graphics = bad. A lot of folks just refused to believe that Apple would dump Intel, forget about leaving AMD and having desktop level GPU performance on the same package. When asked about it, I remember Lisa Su meekly stating, "We are the graphics partner of Apple", which is technically still true with add-in cards for the Mac Pro.
> 
> The latest 3090 Ti is pushing 450w, and Nvidia allegedly has plans to go up to 600w with Lovelace, with some specialty cards hitting 800w. That's simply unsustainable and something is going to have to give. Maybe there will always be standalone GPUs, like you can still get a sound card, but everything is becoming more integrated, not less.



As someone who is not learned on the topic, but who has a gaming PC with an i5 and a GTX 2070, doesn't this all boils down to technology and physics? In other words, to achieve the desired graphic effects requires the circuitry and power whether it is in a stand alone card or integrated? All that will alter this are breakthroughs in technology.

Back 20 years ago, I never felt the need to have the top of the line graphic card, because I never could justify the expense in my head and that was when cards ran about $200!  I suspect this is a marketing distortion, but then it seemed that you needed the new expensive cards to run the latest games you wanted to play without struggling. The most I’ve spent on a card is about $500 which is too much imo.


----------



## diamond.g

Huntn said:


> Back 20 years ago, I never felt the need to have the top of the line graphic card, because I never could justify the expense in my head and that was when cards ran about $200!



Yeah it really depends on what kinds of games you are wanting to play. Well that and how pretty (and how many frames you want) in said games.


----------



## Huntn

diamond.g said:


> Yeah it really depends on what kinds of games you are wanting to play. Well that and how pretty (and how many frames you want) in said games.



When playing with lesser cards, the games seemed to play well enough.  It was just a few of the over the top brand new games that were melting GPUs, and look at New World, my understanding is there are some pissed players out there with a glob that used to be their graphic card.


----------



## diamond.g

Huntn said:


> When playing with lesser cards, the games seemed to play well enough.  It was just a few of the over the top brand new games that were melting GPUs, and look at New World, my understanding is there are some pissed players out there with a glob that used to be their graphic card.



The joys of scalability!


----------



## Yoused

Curious thing here, AMD's graphics card drivers seem to be causing the companion Ryzen processor to overclock, without asking. Or, maybe not so curious, because they do not have that effect when the CPU is an Intel.


----------



## gollum

Intel Arc laptop GPUs are only available in South Korea for now

I find this disappointing, I was curious to see how it measures up against AMD and Nvidia


----------



## Cmaier

gollum said:


> Intel Arc laptop GPUs are only available in South Korea for now
> 
> I find this disappointing, I was curious to see how it measures up against AMD and Nvidia




I find this fun. I recall many times while at AMD where we couldn’t convince OEMs to sell products with our parts in them, due to Intel’s anti-competitive practices.  Let Intel squirm a bit.


----------



## diamond.g

gollum said:


> Intel Arc laptop GPUs are only available in South Korea for now
> 
> I find this disappointing, I was curious to see how it measures up against AMD and Nvidia



For the reported TDP it isn't that bad. Apparently the drivers are still trash though.









						Intel Arc A350M GPU has finally been tested, slower than GTX 1650, up to 2.2 GHz clock - VideoCardz.com
					

Intel Arc A350M tested on Samsung Galaxy Book2 Pro As we have reported earlier today, Intel Arc A-Series of mobile GPUs are currently exclusively available for the South Korean market. This is where one may find Samsung Galaxy Book2 Pro laptop optionally outfitted with Arc A350M graphics. As it...




					videocardz.com


----------



## Renzatic

Cmaier said:


> Commenters say this is good




It's not terrible. They're both entry level cards, with the Arc being within spitting distance of matching the 1650. If the Arc is the less expensive option, I could see it being a compelling purchase.


----------



## Cmaier

Renzatic said:


> It's not terrible. They're both entry level cards, with the Arc being within spitting distance of matching the 1650. If the Arc is the less expensive option, I could see it being a compelling purchase.




I guess i wonder what the market is. If you are not happy with integrated graphics, but don’t need TOO much more performance?


----------



## Renzatic

Cmaier said:


> I guess i wonder what the market is. If you are not happy with integrated graphics, but don’t need TOO much more performance?




You've pretty much nailed it. Do you want a machine with a good bit more oomph than what you get from an integrated GPU, but don't want to have to deal with an overheating laptop sporting fans that sound like a jet engine spinning up? If so, Intel Arc might be just for you!


----------



## Cmaier

Renzatic said:


> You've pretty much nailed it. Do you want a machine with a good bit more oomph than what you get from an integrated GPU, but don't want to have to deal with an overheating laptop sporting fans that sound like a jet engine spinning up? If so, Intel Arc might be just for you!



I figure most people willing to live with arc performance could probably save a few bucks and live with integrated graphics performance. I dunno. I guess I’m not the market for this.


----------



## Renzatic

Cmaier said:


> I figure most people willing to live with arc performance could probably save a few bucks and live with integrated graphics performance. I dunno. I guess I’m not the market for this.




It's a good card for people who want to play games, do some 3D work, or some light to moderate video editing on the cheap. I'd say it's aimed roughly at the same market as the 13" MBP.


----------



## JayAgostino

Cmaier said:


> Intel’s first Arc GPUs are launching today for laptops
> 
> 
> More powerful Arc GPUs will arrive this summer.
> 
> 
> 
> 
> www.theverge.com
> 
> 
> 
> 
> 
> This is very very late in the game, and is something that Intel has been working on for a very long time.  Finally releasing these now is thinking small.  Yeah, Intel will likely make enough money on this to recoup the 15 years of R&D that they’ve put into it, but this is not very forward-looking.  What Intel should have learned from apple is that the future is likely to be integrated CPU/GPU packages, probably with a unified memory architecture.  Discrete GPUs are already somewhat niche, and that’s going to be even more the case in 5 years.



The Intel Arc A750 is being hammered by media outlets.








						Intel is setting expectations low for its Arc GPUs
					

Intel says its Arc A750 performs slightly better than the RTX 3060.




					www.theverge.com


----------



## Jimmyjames

The only thing I can add to this discussion is the opinion of someone who was a Distinguished Architect at Nvidia for nearly two decades, and has now moved to be Director of GPU Architecture at Apple.

https://www.twitter.com/i/web/status/1538341810887532544/
https://www.twitter.com/i/web/status/1538362629818703872/


----------



## Colstan

While definitely a niche product, one area where I think Intel has done some quality engineering is with their NUC enthusiast products. They're basically high-end laptops without the screen, and are Intel's answer to the Mac mini. The last version, marketed as NUC11PHKi7C, is better known under its code name, Phantom Canyon. (There were complaints about the M1 Pro/Max/Ultra names being confusing, yet somehow Intel product names are okay.) The last version had a 4C/8T Tiger Lake CPU with an Nvidia mobile RTX 2060. You can replace the RAM and SSD, but not much else. So, it's similar to the Intel generation of Mac mini. Here's Phantom Canyon:





Other than the silly glowing skull gimmick, I think it's a respectable piece of engineering, and Intel was able to keep it surprisingly quiet. It's not a mass-market device, but wasn't meant to be. With similar specs, it also happens to be twice the price of the M1 Mac mini, $700 vs. $1,400, without an operating system. Still, if you want a small, quiet PC that can play Windows games, it probably fits in with the "enthusiast" crowd that it was marketed toward.

It must be financially successful, because Intel has been making versions of it for years. The bizzaro world Hades Canyon, which was released in 2018 when hell froze over, was co-designed with AMD, featuring an 8th gen Intel CPU paired with a custom AMD Vega mobile GPU.





Now, Intel has been working on the latest version, namely Serpent Canyon, and the leaked specs look much improved compared to its predecessor. The CPU is a Core i7-12700H 6P/8E Cores, 20 Threads, 4.7Ghz, with 24MB cache. That's a nice improvement over the Tiger Lake CPU in the previous model.

However, Intel is dumping Nvidia in favor of it's own graphics, namely an Arc A770M GPU with full ACM-G10 processor with 32 Xe-Cores and a total of 16GB GDDR6 memory. Since this is a small form factor, they aren't competing with desktop cards, but there are a lot of questions surrounding drivers. I'm sure Intel would prefer to push its own solution, but if the drivers aren't ready, then they could significantly impact this product.

I would note that they increased case the size, and therefore cooling, in Serpent Canyon:



Personally, I think it looks better when laid horizontally:





Regardless, you can see that Intel had to increase case volume, but at least you get full-size DisplayPort connectors, instead of mini-DP. Originally, Serpent Canyon was supposed to launch in Q2 of this year, along with the majority of the Arc dGPU line, but it's been pushed back until about Q4, with Q1 of 2023 being more likely for general availability.

No word on pricing, but it's probably going to be in the same $1,500 range as its predecessor, which is again, twice the price of the Mac mini. That being said, there are a number of competitors in the small desktop computer form factor space, and almost all of them suck. The performance is typically crap and the fan noise is unbearable. Thus far, Intel's engineers have done a good job of keeping these enthusiast NUC devices quiet, which considering they've got a high-end laptop CPU and dGPU inside, is an achievement.

As someone is currently using a 2018 Intel Mac mini, my fourth mini since I switched to Mac in 2005, I want to see more competition in this space. I plan on replacing my 2018 mini with an M3 generation Mac, most likely either another Mac mini with an M3 Pro, if such a device ever gets released, or a Mac Studio with an M3 Max. While I'd never give up my Mac for an x86 Windows machine, I'd consider getting one of these for PC games on the side. It would be reasonably easy to share peripherals between the two and neither would take up much desk space. If Apple's push with Metal 3 for Mac gaming is successful, then I won't have to consider the NUC, but it's always good to have options.

However, Intel is having a difficult time getting its act together with Arc. The problem seems to be with drivers, while AMD and Nvidia have many decades of experience, and of course Apple controls the entire stack. It would be a shame for Intel's enthusiast mini PC offering to be DOA because they pushed Arc into it too soon. On paper, it looks great, but it remains an enthusiast product. If those enthusiasts can't take advantage of the GPU because of shoddy drivers with first-gen silicon, then it could kill the entire product line.

I have to wonder, if by the time Intel gets its drivers sorted out, the PC chip guys will realize that Apple has the right of it and will move to a fully integrated SoC, thus rendering this concept obsolete. Sometimes it's fun to gawk at the PC guys and wonder what they were thinking, but I do want this product to succeed, because the Mac mini and Mac Studio could use the competition. However, Intel's GPU choices may kill the product before it even launches.


----------



## exoticspice1

Cmaier said:


> Intel’s first Arc GPUs are launching today for laptops
> 
> 
> More powerful Arc GPUs will arrive this summer.
> 
> 
> 
> 
> www.theverge.com
> 
> 
> 
> 
> 
> This is very very late in the game, and is something that Intel has been working on for a very long time.  Finally releasing these now is thinking small.  Yeah, Intel will likely make enough money on this to recoup the 15 years of R&D that they’ve put into it, but this is not very forward-looking.  What Intel should have learned from apple is that the future is likely to be integrated CPU/GPU packages, probably with a unified memory architecture.  Discrete GPUs are already somewhat niche, and that’s going to be even more the case in 5 years.



Intel's 14th gen meteor lake and 15th gen arrow lake will have large iGPUs like M1 Max and M2 Max but they won't be unified memory


----------



## exoticspice1

Jimmyjames said:


> The only thing I can add to this discussion is the opinion of someone who was a Distinguished Architect at Nvidia for nearly two decades, and has now moved to be Director of GPU Architecture at Apple.
> 
> https://www.twitter.com/i/web/status/1538341810887532544/
> https://www.twitter.com/i/web/status/1538362629818703872/



Wtf this is a huge find!! You know it's funny with authors like Dylan Patel stating that Apple has a brain drain but Apple hiring Oliver is huge.

Oliver worked at Nvidia for 19 years, he knows his shit. Pardon the French. Apple GPUs IMO in the coming years will be fun to test.


----------



## Huntn

With graphic cards running $500-1000,  I’ll be thrilled when integrated graphics can replace dedicated graphics and get the job done.


----------



## diamond.g

Huntn said:


> With graphic cards running $500-1000,  I’ll be thrilled when integrated graphics can replace dedicated graphics and get the job done.



Is that a matter of how much of a rendering quality tradeoff you are willing to put up with?


----------



## throAU

So that didn’t take long… great success etc.


----------



## Cmaier

throAU said:


> So that didn’t take long… great success etc.




Itanium (itanic), Arc,… Intel seems to pick names that can easily be modified into sinking boat memes.


----------



## Yoused

Cmaier said:


> Intel seems to pick names that can easily be modified into sinking boat memes.



What about the great new Microsoft UI front end that was named for a drowning man?

("New" meaning new at the time, 30 years ago.)


----------



## Cmaier

Yoused said:


> What about the great new Microsoft UI front end that was named for a drowning man?
> 
> ("New" meaning new at the time, 30 years ago.)



The one with no arms and no legs?


----------



## throAU

Yoused said:


> What about the great new Microsoft UI front end that was named for a drowning man?
> 
> ("New" meaning new at the time, 30 years ago.)



I thought that was Cairo?


----------



## Cmaier

throAU said:


> I thought that was Cairo?



I assume he meant Bob.


----------



## throAU

Cmaier said:


> I assume he meant Bob.



I forgot about Bob


----------



## Colstan

Despite the above video from Moore's Law is Dead, who usually has very trustworthy sources, we have an alternate universe linked from this article, in which Intel has published its entire Arc lineup and a Q&A to go along with it. There's a lot of contradictory information on Arc. I think there's a turf war going on within the company. My suspicion is that Tom's sources are outside the Arc division, and those fiefdoms smell blood in the water and see an easy scapegoat to get whacked, sparing them the executioner's axe. Meanwhile, the Arc team is pretending it's business as usual, desperately attempting to save face publicly, while trying privately to avoid ending up a creek wearing cement shoes.

In other related news, during the the company's Tech Tour, Intel has strongly hinted that 13th-gen Molten Lake will hit 6Ghz under mysterious circumstances and unknown wattages. This is conveniently 300Mhz higher than Zen 4. Team Blue also hit 8Ghz, showing off a world overclocking record. Remember when NetBurst was supposed to hit 10Ghz back in 2005?

In more vague Intel news, they are apparently working on the next version of Thunderbolt, reaching speeds of 80Gbps, the same as the idiotically named USB4 Version 2.0. Thunderbolt has gone to hell ever since Apple stopped actively contributing and transferred the trademark to Intel. I think that was an early sign that Apple really was serious about ridding itself of them in every way possible.

So, it looks like everything is just sunshine, dandelions, and frolicking nymphs with Intel's GPU division. Meanwhile, Team Blue and Red are engaging in the Second Gigahertz War, which will probably go as well as the Second Punic War. Meanwhile, Apple continues to make strides in reducing power consumption, increasing performance/watt, and doing it across their entire product stack, with the latest iPhones, and very soon, high-end workstations.


----------



## Andropov

Colstan said:


> In other related news, during the the company's Tech Tour, Intel has strongly hinted that 13th-gen Molten Lake will hit 6Ghz under mysterious circumstances and unknown wattages. This is conveniently 300Mhz higher than Zen 4. Team Blue also hit 8Ghz, showing off a world overclocking record. Remember when NetBurst was supposed to hit 10Ghz back in 2005?



Hah, I was just going to say that it's 2003 all over again, and then I read your next sentence 

Maximum turbo boost power usage jumps from 190 to 253W on the i9, a 63W (33%) increase in power consumption. This looks like a perfectly reasonable path to follow for their upcoming processors. With a 33% YoY increase in power consumption, 4 years from now their i9 counterpart would only reach... *checks notes* 791W. Brilliant.

Adding more 'E' cores is a genuinely good decision, though. I think.


----------



## Colstan

Andropov said:


> Adding more 'E' cores is a genuinely good decision, though. I think.



It depends. If Intel is targeting the high-end productivity crowd, then more E-cores will help. If they are targeting gamers, then thus far, the E-cores have been worthless. I'm not in the market for Intel's tarpit chips, but if I were, then I'd go for the ones which forgo the E-cores, and feature only P-cores. Intel has to convince content creators that they need a crazy number of middling cores, while convincing gamers to ignore those same cores, to concentrate on the P-cores, which will primarily be shoved into the top-end "K" SKUs. Captive fanboys will eat it up, but I suspect most other PC enthusiasts will be looking at Zen 4 with V-Cache. It's not a good place to be in for Intel, and that doesn't include power consumption, as you have rightly pointed out.


----------



## throAU

Colstan said:


> then thus far, the E-cores have been worthless.




Not really, hardware unboxed shows the difference between on/off and they do make a difference.


----------



## Colstan

throAU said:


> Not really, hardware unboxed shows the difference between on/off and they do make a difference.



I wasn't aware of that. I recall Gamer's Nexus finding little benefit, if I recall correctly, but Hardware Unboxed does quality work. I wonder how much of this has to do with it simply being new technology. I remember Intel recommending the use of Win11 because of enhancements to the scheduler, which also happened to handicap AMD's CPUs in the process. I'm not a believer in conspiracies, I think it's more likely that Microsoft botched something, as is tradition.

I admit that I haven't been following the "little cores" saga as much as other product features. It's not exactly the most exciting implementation. I think the 8+16 arrangement in high-end Raptor Lake is a bit nuts. Even if the efficiency cores help with some games, I can't see this making a huge difference, yet that is how Intel is marketing it. Again, I apologize for not recalling the exact details, this is hardly my area of expertise, but @leman explained why Intel needs to use this efficiency cores implementation, instead of just continuing to use only performance cores. Also, I believe the rumor is that Zen 5 will adopt the same strategy, with AMD using a variant of Zen 4 as the little cores.

It does appear that Intel, at least, are doing this for a reason other than as a way to process background tasks for energy efficiency, like Apple is doing. If that were the case, then the 13900K wouldn't have an 8+16 implementation, 24C/32T. Apple has thus far done the opposite, with performance M1 models having fewer e-cores, not more. As has been pointed out, Apple hasn't seen value in SMT, either.

I imagine this issue has been resolved by now, but Andrew Tsai did some benchmarks with Parallels when they released a version supporting M1. He found that certain configurations performed worse with games, because Parallels was unable to distinguish between the M1's p-cores and e-cores. I wonder how many teething issues caused problems for Intel, just on a much wider scale, since they have no control over the software platform.


----------



## Andropov

Colstan said:


> I remember Intel recommending the use of Win11 because of enhancements to the scheduler, which also happened to handicap AMD's CPUs in the process. I'm not a believer in conspiracies, I think it's more likely that Microsoft botched something, as is tradition.



Yup. Hanlon's razor is the best razor.


----------



## Yoused

Colstan said:


> It does appear that Intel, at least, are doing this for a reason other than as a way to process background tasks for energy efficiency, like Apple is doing. If that were the case, then the 13900K wouldn't have an 8+16 implementation, 24C/32T.




AIUI, Anthill seems to have the SMT principle backwards. If they had it on the E-cores but not the P cores, they would get the best perfomance, because two heavy-work threads sharing one core (as one would expect with P cores and an elaborate scheduler) tends to have a net negative performance impact, but light work flows better in SMT.

On the die, though, a P core looks like a postcard, with an E core for a postage stamp, so it is easier to fit in 6 or 8 E cores in the space of one P. This prevents the die from getting huge from adding too many P cores, and hyperthreading the big cores is also a space saver.



Colstan said:


> Apple has thus far done the opposite, with performance M1 models having fewer e-cores, not more. As has been pointed out, Apple hasn't seen value in SMT, either.




IBM, though, does see value in it. POWER9 server CPUs had cores that could run 4 threads at a time, POWER10 has 8-way SMT. Presumably they have studied loads and properly outfitted the cores with enough EUs to make it worthwhile. But IBM is targeting high-level performance while Apple is mostly going for consumer-grade efficiency, so maybe SMT is better when you have a big straw in the juice.

Though, I did see an article that said one client replaced over a hundred x86-64 servers with less than a dozen POWER9 servers, which sounds like probably a non-small energy savings. SMT, done well, allows for more virtual cores to fit in only slightly more die space, but I think Apple may be seeing CPUs starting to plateau in the sense that they are observably good enough at what they do that making them faster will yield minimal results, where the real heavy work occurs in the GPU and the ASICs so that is where you put your effort.

Compared to very OoOE, SMT is probably not much harder to implement (though, far from easy). Apple, however, makes systems, not CPUs, so they have the luxury of not having to rely on awesome processor performance when the SoC offers even better results that are more noticeable to the user.


----------



## Andropov

Colstan said:


> I admit that I haven't been following the "little cores" saga as much as other product features. It's not exactly the most exciting implementation. I think the 8+16 arrangement in high-end Raptor Lake is a bit nuts. Even if the efficiency cores help with some games, I can't see this making a huge difference, yet that is how Intel is marketing it. Again, I apologize for not recalling the exact details, this is hardly my area of expertise, but @leman explained why Intel needs to use this efficiency cores implementation, instead of just continuing to use only performance cores. Also, I believe the rumor is that Zen 5 will adopt the same strategy, with AMD using a variant of Zen 4 as the little cores.
> 
> It does appear that Intel, at least, are doing this for a reason other than as a way to process background tasks for energy efficiency, like Apple is doing. If that were the case, then the 13900K wouldn't have an 8+16 implementation, 24C/32T. Apple has thus far done the opposite, with performance M1 models having fewer e-cores, not more. As has been pointed out, Apple hasn't seen value in SMT, either.



The thing is, Intel's E cores have a totally different philosophy than Apple's E cores. Apple's E cores come from the efficiency focused iPhone and Apple Watch (the E cores are the main cores in the S7/S8 SoC). Designed to be as efficient as possible, so the watch can have longer battery life (performance is not that much of a concern in the Watch, you can't even install Geekbench on it ), and the iPhone can have something super efficient for background tasks where performance does not matter so those use the least possible amount of juice.

Intel's E cores are a whole different beast. Those are not really efficiency focused cores, nor are designed with the lowest possible power consumption in mind. Intel's E cores are actually much closer to Apple's P cores than to Apple's E cores. They're more of a 'middle core' thing, really.

From the looks of it, Intel's most performant core designs started getting quickly diminishing returns a few years ago. To compensate for it, more die area and power was thrown at the problem, managing to slightly increase the single core performance every year. But since they are getting diminishing returns, the P cores are now HUGE, and use a lot of power to extract those final bits of performance. With Alder Lake, they realised that the P cores were using so much die area that they throw in many of the less performant cores they were using for the Intel Atom line in the same die area as a few P cores.

But the Gracemont (Alder Lake's E) cores are not really that small, in fact there are PCs shipped with Atom processors that only have Gracemont cores, while it would be unthinkable for a Mac to ship with Apple's E cores only (not even the iPhone has E cores only).

I don't think it's a bad idea. After all, if you can parallelize a task into 8 threads, chances are it's trivially parallelizable to 32 cores too. There are practical limits to this, of course, it's not like they can double the number of cores every year.


----------



## Cmaier

Andropov said:


> The thing is, Intel's E cores have a totally different philosophy than Apple's E cores. Apple's E cores come from the efficiency focused iPhone and Apple Watch (the E cores are the main cores in the S7/S8 SoC). Designed to be as efficient as possible, so the watch can have longer battery life (performance is not that much of a concern in the Watch, you can't even install Geekbench on it ), and the iPhone can have something super efficient for background tasks where performance does not matter so those use the least possible amount of juice.
> 
> Intel's E cores are a whole different beast. Those are not really efficiency focused cores, nor are designed with the lowest possible power consumption in mind. Intel's E cores are actually much closer to Apple's P cores than to Apple's E cores. They're more of a 'middle core' thing, really.
> 
> From the looks of it, Intel's most performant core designs started getting quickly diminishing returns a few years ago. To compensate for it, more die area and power was thrown at the problem, managing to slightly increase the single core performance every year. But since they are getting diminishing returns, the P cores are now HUGE, and use a lot of power to extract those final bits of performance. With Alder Lake, they realised that the P cores were using so much die area that they throw in many of the less performant cores they were using for the Intel Atom line in the same die area as a few P cores.
> 
> But the Gracemont (Alder Lake's E) cores are not really that small, in fact there are PCs shipped with Atom processors that only have Gracemont cores, while it would be unthinkable for a Mac to ship with Apple's E cores only (not even the iPhone has E cores only).
> 
> I don't think it's a bad idea. After all, if you can parallelize a task into 8 threads, chances are it's trivially parallelizable to 32 cores too. There are practical limits to this, of course, it's not like they can double the number of cores every year.



This. It’s all about the power/performance curve.  Intel’s philosophy tends toward the P cores being way out past the knee, where small performance improvements cost big power consumption changes.  Then their E cores are somewhere around the knee.   

Apple’s P-cores are around the knee, and their E-cores are around the linear region. Recently their E-cores have been moving up the curve, or, more accurately, they have been gaining the ability to operate higher in the linear region (we see that in the M2 vs. M1 benchmarks, where the E-cores got a lot of performance improvement without costing a lot of power consumption).  

If you want to win performance benchmarks (without regard to power consumption), you do what Intel is doing.


----------



## Yoused

Cmaier said:


> If you want to win performance benchmarks (without regard to power consumption), you do what Intel is doing.




What I want to know, somewhere there was talk of the specs for the LaBrea Lake processor, saying the P-cores would have "turbo boost" to 6GHz: what exactly does one do with that?

Perusing through the M1 reverse-engineering article I found, the author says that a L3 miss to DRAM can cost a complete fill of the 600+ entry ROB before the offending op can be retired, and that is at about half of 6GHz. Thus, the x86 P-cores at "turbo" would probably do a fair bit of stalling, even with very good cache load speculation.

I compare it with the city driving style I was taught, where you drive through downtown at about 5 under the speed limit, getting to almost every traffic light just after it turns green while the car next to you leaps off every line and has to brake hard because they hit every red, and at the edge of town, at the last light, you blow past them because it is quicker to accelerate from 20 to 45 than from a stop.

What I mean is that Apple gets the work done at a steady 3.2 while the other guys are in a sprint-stall-sprint-stall mode that is slightly less productive, but they have to do it that way because "5.2GHz" and "TURBO Boost" are really cool and impressive sounding (i.e., good marketing).


----------



## throAU

Cmaier said:


> we see that in the M2 vs. M1 benchmarks, where the E-cores got a lot of performance improvement without costing a lot of power consumption




I really suspect the E cores on M1 were bandwidth starved (or not enough cache to deal with the smaller memory bandwidth).

On the M1 Pro/Max they used half the E cores but the 2 E cores in Pro/Max get the same performance as the 4 on original M1.


----------



## Cmaier

Yoused said:


> What I want to know, somewhere there was talk of the specs for the LaBrea Lake processor, saying the P-cores would have "turbo boost" to 6GHz: what exactly does one do with that?
> 
> Perusing through the M1 reverse-engineering article I found, the author says that a L3 miss to DRAM can cost a complete fill of the 600+ entry ROB before the offending op can be retired, and that is at about half of 6GHz. Thus, the x86 P-cores at "turbo" would probably do a fair bit of stalling, even with very good cache load speculation.
> 
> I compare it with the city driving style I was taught, where you drive through downtown at about 5 under the speed limit, getting to almost every traffic light just after it turns green while the car next to you leaps off every line and has to brake hard because they hit every red, and at the edge of town, at the last light, you blow past them because it is quicker to accelerate from 20 to 45 than from a stop.
> 
> What I mean is that Apple gets the work done at a steady 3.2 while the other guys are in a sprint-stall-sprint-stall mode that is slightly less productive, but they have to do it that way because "5.2GHz" and "TURBO Boost" are really cool and impressive sounding (i.e., good marketing).




it’s the never-ending microarchitecture trade-off.  You can split the job into more steps by adding pipeline stages, which allows you to increase clock frequency (i think of it as REQUIRES you to increase clock frequency - if you don’t, you are losing performance).  If you can keep the pipelines full, great.  But if not, the cost of dealing with an exception increases tremendously.  

And, of course, higher clocks mean a linear increase in power (though so does more IPC, usually), plus a squared increase in power by however much you needed to raise the voltage so that the signal switching edge rates are high enough to get the job done.


----------



## Colstan

Cmaier said:


> Apple’s P-cores are around the knee, and their E-cores are around the linear region. Recently their E-cores have been moving up the curve, or, more accurately, they have been gaining the ability to operate higher in the linear region (we see that in the M2 vs. M1 benchmarks, where the E-cores got a lot of performance improvement without costing a lot of power consumption).



I'm about to ask an entirely unfair question, so feel free to tell me to sod off if you'd like. Most folks here are aware that I just purchased a Mac Pro, I've got an entire thread where I've been bloviating about it, so it's hard to miss. So, unless something breaks, my next Mac is probably going to have something like an M6 or M7 inside it, which is difficult for me to wrap my head around. I don't expect any specifics this far out, divining the basics of the M3 is already hard enough, but do you have any expectations or see any general trends for where Apple may take the M-series in the next half-decade? Like I said, unfair question, but I figured it wouldn't hurt to ask.


----------



## Cmaier

Colstan said:


> I'm about to ask an entirely unfair question, so feel free to tell me to sod off if you'd like. Most folks here are aware that I just purchased a Mac Pro, I've got an entire thread where I've been bloviating about it, so it's hard to miss. So, unless something breaks, my next Mac is probably going to have something like an M6 or M7 inside it, which is difficult for me to wrap my head around. I don't expect any specifics this far out, divining the basics of the M3 is already hard enough, but do you have any expectations or see any general trends for where Apple may take the M-series in the next half-decade? Like I said, unfair question, but I figured it wouldn't hurt to ask.




Obviously it would be complete speculation, and it’s not really something I’ve even thought about.  But that many generations out, if I had to guess, I think, at least for Macs, we’d see much bigger packages with much more powerful GPUs that live on their own die.  I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software.  Maybe there will be three levels of CPU, maybe there will be much bigger ML components, etc.  More and more of the auxiliary die are going to end up in the package.   Really, I expect the packaging to become as important as the die.  Bandwidth is Apple’s focus, and they will find ways to get data into and out of the package much faster, and across the package faster too.


----------



## Yoused

Cmaier said:


> I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software.



Over at TOP there is a thread linking to an article about how Apple is migrating its CPs over to RISC-V from embedded ARM, in order to reduce licensing costs. Seems a bit baffling to me: given that there is software out there that can take a high-level-language program and build you a dedicated circuit that performs a specifc range of tasks (e.g., image processing, data compression, etc) far faster and more efficiently than a program running in some sort of core, so why would they not do it that way, especially with the attendant improvement in battery life? All the heterogenous units are obfuscated behind APIs, so in theory, huge swaths of OS services could be accelerated just by pulling the functions out of software libraries and baking them in.

IOW, the article looks like fud, to me.


----------



## diamond.g

Cmaier said:


> Obviously it would be complete speculation, and it’s not really something I’ve even thought about.  But that many generations out, if I had to guess, I think, at least for Macs, we’d see much bigger packages with much more powerful GPUs that live on their own die.  I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software.  Maybe there will be three levels of CPU, maybe there will be much bigger ML components, etc.  More and more of the auxiliary die are going to end up in the package.   Really, I expect the packaging to become as important as the die.  Bandwidth is Apple’s focus, and they will find ways to get data into and out of the package much faster, and across the package faster too.



Monolithic die or chiplet?


----------



## Cmaier

diamond.g said:


> Monolithic die or chiplet?



I would guess, for economic reasons, multi-chip packages.  (I refuse to say ”chiplet”).  Cerberus is pretty interesting, though, isn’t it? Lot of my friends over there.  I stopped by and looked around a couple years ago when they were all squeezed into a little office behind Lulu’s in Los Altos.


----------



## Cmaier

Yoused said:


> Over at TOP there is a thread linking to an article about how Apple is migrating its CPs over to RISC-V from embedded ARM, in order to reduce licensing costs. Seems a bit baffling to me: given that there is software out there that can take a high-level-language program and build you a dedicated circuit that performs a specifc range of tasks (e.g., image processing, data compression, etc) far faster and more efficiently than a program running in some sort of core, so why would they not do it that way, especially with the attendant improvement in battery life? All the heterogenous units are obfuscated behind APIs, so in theory, huge swaths of OS services could be accelerated just by pulling the functions out of software libraries and baking them in.
> 
> IOW, the article looks like fud, to me.




I don’t know whether Apple pays Arm licensing fees, or if they pay fees that are based on volume, or what.  Obviously Apple is a special case in the Arm ecosystem, having actually paid to form Arm.  I also don’t know if they are switching anything to RISC-V (though I know there were some employment ads placed).  If they are, it could just as easily be to get their feet wet and gain familiarity with the technology as anything else. Who knows.

As for why they don’t just create dedicated circuits, it sort of depends on the complexity of the problem.  A general purpose CPU spends extra power and may have less performance, but performance may not matter, the power may be so small as not to matter, and it’s a hell of a lot easier to fix a bug in software than in a chip mask (especially once the chip is sitting in a customer’s device).


----------



## Colstan

Yoused said:


> IBM, though, does see value in it. POWER9 server CPUs had cores that could run 4 threads at a time, POWER10 has 8-way SMT. Presumably they have studied loads and properly outfitted the cores with enough EUs to make it worthwhile. But IBM is targeting high-level performance while Apple is mostly going for consumer-grade efficiency, so maybe SMT is better when you have a big straw in the juice.



Thinking about this further, both @Yoused and @Cmaier have explained to me why Apple may see value in implementing SMT with their E-cores, but it makes little sense with the P-cores. From what folks have said here, x86 by nature, can benefit from SMT much more than RISC ISAs. (I won't rehash that discussion here, it's buried somewhere in the x86 vs. Arm thread.) In the back of my mind, I did find it curious that IBM found value in SMT with POWER, implementing 8-way SMT, as you point out.

After having rummaged through @Cmaier's brain about the future of the M-series, and where Apple may take the Mac in the next half-decade, it does make me wonder if there is a scenario in which it makes sense for Apple to implement SMT in both the P-cores and E-cores? (Or even "Middle cores" if such a thing ever materializes, if that scenario even makes logical sense?) As has also been pointed out, there are only so many ways to increase IPC, and Apple is going to need to get creative to find ways to do so. This is entirely speculative, as I said in my original question, but are there changes to Apple Silicon that Apple could implement that would then make it so that some form of SMT makes sense? Apparently it doesn't right now, but the M-series is going to look much different in a half-decade than it does today, in whatever form it takes. (Of course, this has absolutely nothing to do with me needing a new Mac around that time period, total coincidence.) Any thoughts from knowledgable folks here would be most welcome.


----------



## Cmaier

Colstan said:


> Thinking about this further, both @Yoused and @Cmaier have explained to me why Apple may see value in implementing SMT with their E-cores, but it makes little sense with the P-cores. From what folks have said here, x86 by nature, can benefit from SMT much more than RISC ISAs. (I won't rehash that discussion here, it's buried somewhere in the x86 vs. Arm thread.) In the back of my mind, I did find it curious that IBM found value in SMT with POWER, implementing 8-way SMT, as you point out.
> 
> After having rummaged through @Cmaier's brain about the future of the M-series, and where Apple may take the Mac in the next half-decade, it does make me wonder if there is a scenario in which it makes sense for Apple to implement SMT in both the P-cores and E-cores? (Or even "Middle cores" if such a thing ever materializes, if that scenario even makes logical sense?) As has also been pointed out, there are only so many ways to increase IPC, and Apple is going to need to get creative to find ways to do so. This is entirely speculative, as I said in my original question, but are there changes to Apple Silicon that Apple could implement that would then make it so that some form of SMT makes sense? Apparently it doesn't right now, but the M-series is going to look much different in a half-decade than it does today, in whatever form it takes. (Of course, this has absolutely nothing to do with me needing a new Mac around that time period, total coincidence.) Any thoughts from knowledgable folks here would be most welcome.




Anything’s possible but I Imagine the meeting went like this:

“Let’s go all out, not worry about power or die area, and maximize MP performance!”

Response: “ok. Let’s double the number of cores and not have to worry about implementation bugs or side channel attacks.”


----------



## Yoused

Colstan said:


> … it does make me wonder if there is a scenario in which it makes sense for Apple to implement SMT in both the P-cores and E-cores?



P cores are like a postcard compared to E cores like their postage stamp. It is much easier to just add more E cores than to try to split them with SMT, because they are so small. Anthill still does it with the P cores because one core is so big that splitting its pipe gives you two cores with only a sight die space increase (a two-thread core is much smaller than two single-thread cores).

Apple's P cores are so absurdly OoOE that adding SMT would be more of an effort than the gain would offer (and might tangle up with some POWER patents). Instead, Apple is improving E core performance by a lot. What I could see down the road for most of M-series would be more of E cores handling the threads with some kind of mechanism that would allow them to recruit HPC circuitry for high-load (mainly SIMD) work. Kind of like having coprocessor semicores.

Because Apple is moving toward more heterogenous computing, with frequent use jobs handled by dedicated units that are speed and energy efficient. This is kind of like unfolding the '70s microcode ethos into satellite logic, which is facilitated by having those jobs obfuscated by the OS.


----------



## Colstan

Perhaps too little, too late, but Intel is officially releasing Arc graphics cards on Oct. 12th. That just so happens to be the exact same day as Nvidia's 4000-series becomes available. So, perhaps they are trying to show how much value they are able to offer compared to Nvidia's space heaters, or more cynically, hiding the launch at a time when Nvidia is going to get all of the press attention, thus sparing them some measure of embarrassment. Regardless, this is way too late, and they could have done well during the GPU craze with pandemic buying and crypto insanity, but that's over. I just got ordered a 6900XT for $659, and these things were going for $1,600 a year ago. I guess it's something, but I don't see this making an impact, other than being a novelty product. It's kinda like owning an i740, a project which Gelsinger was in charge of, assuming history repeats itself.


----------



## exoticspice1

Yoused said:


> P cores are like a postcard compared to E cores like their postage stamp. It is much easier to just add more E cores than to try to split them with SMT, because they are so small. Anthill still does it with the P cores because one core is so big that splitting its pipe gives you two cores with only a sight die space increase (a two-thread core is much smaller than two single-thread cores).
> 
> Apple's P cores are so absurdly OoOE that adding SMT would be more of an effort than the gain would offer (and might tangle up with some POWER patents). Instead, Apple is improving E core performance by a lot. What I could see down the road for most of M-series would be more of E cores handling the threads with some kind of mechanism that would allow them to recruit HPC circuitry for high-load (mainly SIMD) work. Kind of like having coprocessor semicores.
> 
> Because Apple is moving toward more heterogenous computing, with frequent use jobs handled by dedicated units that are speed and energy efficient. This is kind of like unfolding the '70s microcode ethos into satellite logic, which is facilitated by having those jobs obfuscated by the OS.



a but off topic but do you see shift in Apple's GPU late next year with M3 if they add hard raytracing hardware.


----------



## throAU

Nvidia/Amd will just drop price and kill this.


----------



## Yoused

exoticspice1 said:


> a but off topic but do you see shift in Apple's GPU late next year with M3 if they add hard raytracing hardware.




I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.


----------



## Colstan

Yoused said:


> I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.



On a conceptual level, I know what ray tracing is, and what its benefits are. It's hard not to because Nvidia won't shut up about it, even though it often makes games look worse. However, I'm unfamiliar with path tracing. Could you elaborate?


----------



## Yoused

Colstan said:


> On a conceptual level, I know what ray tracing is, and what its benefits are. It's hard not to because Nvidia won't shut up about it, even though it often makes games look worse. However, I'm unfamiliar with path tracing. Could you elaborate?




Path tracing is like the reverse of ray tracing, following light paths to the camera rather than following the viewpoint to the light sources. Just as it is, the process takes a lot longer and has some weaknesses, but it can have the potential to yield more photo-realistic results. And there are ways to make it faster and to alleviate some of the weaknesses. Hardware acceleration combined with a tailored approach managed by ML logic could make some form of it truly practical.


----------



## Nycturne

I’m not sure that’s the case.

Path tracing can be done in either direction. The key difference as I understand it is the sampling method of the rays, and how they are cast through a scene. Ray tracing is more limited in how rays are cast through a scene, while with path tracing. rays are cast through multiple bounces until they hit the other “side” of the scene (a light or screen) or exhaust the number of bounces.

The other big thing is that path tracing will fire many more rays. When casting from the camera, each pixel will have multiple rays fired out, each taking a different path through the scene and then averaging the results. Certain techniques can cut down on the number of rays needed, but the technique relies on random sampling of multiple rays to get good results. 

But ray tracing as I understand is not setup to be able to do any forward tracing (starting from the light), so that is something that path tracing can do which ray tracing cannot, but you can fire the rays from the camera in both cases and still get more realistic results from path tracing.

You can do path tracing on ray tracing accelerators because of the similarities in the techniques (Microsoft and Nvidia have been doing it on RTX and DXR-compatible cards), but you take a performance hit due to rays with longer lifetimes and larger numbers of rays being cast. 

Path tracing will become more common for sure though. Techniques used already that are ML based (denoiser and DLSS in particular) can help in both ray tracing and path tracing to improve the result.


----------



## diamond.g

Nycturne said:


> I’m not sure that’s the case.
> 
> Path tracing can be done in either direction. The key difference as I understand it is the sampling method of the rays, and how they are cast through a scene. Ray tracing is more limited in how rays are cast through a scene, while with path tracing. rays are cast through multiple bounces until they hit the other “side” of the scene (a light or screen) or exhaust the number of bounces.
> 
> The other big thing is that path tracing will fire many more rays. When casting from the camera, each pixel will have multiple rays fired out, each taking a different path through the scene and then averaging the results. Certain techniques can cut down on the number of rays needed, but the technique relies on random sampling of multiple rays to get good results.
> 
> But ray tracing as I understand is not setup to be able to do any forward tracing (starting from the light), so that is something that path tracing can do which ray tracing cannot, but you can fire the rays from the camera in both cases and still get more realistic results from path tracing.
> 
> You can do path tracing on ray tracing accelerators because of the similarities in the techniques (Microsoft and Nvidia have been doing it on RTX and DXR-compatible cards), but you take a performance hit due to rays with longer lifetimes and larger numbers of rays being cast.
> 
> Path tracing will become more common for sure though. Techniques used already that are ML based (denoiser and DLSS in particular) can help in both ray tracing and path tracing to improve the result.



Quake 2 RT and Minecraft (bedrock)  are path traced. Denoising is used to hide low ray count (and/or low bounce count).


----------



## Yoused

I believe the noise problem could be greatly mitigated by aggressive path-weighting, using analytic ML-type logic to assess the overall complexity of path regions so that less complex regions could receive differential gradation, requiring less work overall, and the tracer could concentrate its efforts on the more complex regions. This could improve early-pass performance and efficiency for a large fraction of rendering jobs and is not really very far out of reach of contemporary tech.


----------



## Nycturne

diamond.g said:


> Quake 2 RT and Minecraft (bedrock)  are path traced. Denoising is used to hide low ray count (and/or low bounce count).




I _thought_ that was the case, but I wasn’t certain, so didn’t want to make any claims. I just knew that the hardware was already being used for it in at least a “this is how you do it” capacity.


----------



## leman

Yoused said:


> I suspect Apple will go one better and put in hardware acceleration of path tracing. Everyone accelerates ray tracing, they should stand out from the pack, and path tracing can yield more realistic output. Perhaps logic that accelerates both but favors PT.




RT APIs are just about casting rays and specifying geometry that these rays might intersect. How exactly you use that is up to you. Some folks use RT APIs to do scene node culling or collision detection and that's fair too.

My understanding of these things is fairly limited but it seams to me that RT acceleration (of any kind) is really about compacting and reordering work. Graphics is a massively parallel task where you apply exactly the same sequence of operations to multiple objects (originally vertices or pixels). That's why GPUs are set up as massive SIMD machines with execution width of 32 or more. But RT is divergent in its nature.  Rays tend to scatter in different directions and often hit different objects. This not only means memory divergence (which already kills GPU performance) but also execution divergence as different objects need to invoke different shaders (which is why Metal now has function pointers and recursive GPU calls). To get the performance back on track one somehow needs to compact the work so that it becomes local again. Of course, using additional information (like the ML heuristics you describe) is also valuable, but that's more like a bonus. 

I tried to get some info on how Nvidia's RT acceleration works but there is a surprising lack of concrete information. There was a paper that claimed that the RT acceleration is closely integrated with the texturing unit and relies on reordering memory accesses to improve work locality. Which would explain why Nvidia's RT is so performant. AMD's RT is simpler — it's just a bandaid that uses fixed-function intersection instructions, which helps to speed up computations but does nothing for the divergence problem, which is also a reason why AMD's implementation is so much slower in practice.

There is a faint hope that Apple is late with its hardware RT because they aim to provide a comprehensive general solution for programmable work compacting. A feature like that would be huge deal. I wouldn't even know how to approach such a problem though...


----------



## leman

Since there is interest in the topic, I'll leave here the links to some raytracing-related patents filed by Apple. There might be more, but this is all I was able to find. I find patents very difficult to digest but so far the impression I have that Apple is focusing on some area and energy-efficient approaches to RT which appear to be fairly novel. Their patents describe hardware that does geometry traversal using low-precision calculations and then delegates the final checks to the shaders. 

 US20220036630A1 - SIMD Group Formation Techniques during Ray Intersection Traversal - Google Patents 
 US20220207690A1 - Primitive Testing for Ray Intersection at Multiple Precisions - Google Patents 
 CN114092614A - Ray intersection circuit with parallel ray testing - Google Patents


----------

