Intel Lunar Lake thread

Hmmm … the next time someone argues that Apple puts the RAM on package because they are cheap:


Relatedly, Intel is planning to get off TSMC, but won’t be able to completely do it for the first batch of Intel’s 18A processors (30% will still be TSMC), indicating to me low volume at first. Perhaps expected but it’s important to remember Intel’s need for 18A to be profitable comes not from manufacturing its own chips but being able to serve third parties.

Intel hasn’t executed a node anywhere near on time and meeting expectations in ~15 years at this point. And that’s when those expectations were compared against Intel prior performance as having the worlds best fabs.

I’ll believe it when I see it, and as per @Cmaier i have no confidence in Gelsinger being any more effective at Intel leadership than he is converting me to Christianity.
 
Hmmm … the next time someone argues that Apple puts the RAM on package because they are cheap:


Relatedly, Intel is planning to get off TSMC, but won’t be able to completely do it for the first batch of Intel’s 18A processors (30% will still be TSMC), indicating to me low volume at first. Perhaps expected but it’s important to remember Intel’s need for 18A to be profitable comes not from manufacturing its own chips but being able to serve third parties.

Although not all the factors that hurt Intel would hurt Apple here - i.e. when packaging memory themselves Intel has to buy the memory and they want profit on then selling that, OEMs don't want to pay them and both passing on to the customers would make the overall chip too expensive, so Intel effectively sells the memory portion of the chip to the OEM at cost. Apple wouldn't suffer as much from this since they are both the chipmaker and the OEM and their pricing is already high. Still though some of it does hurt Apple too.
 
Again, this is me trying to drag something out of my brain's long term storage, but I could swear that Intel's limits for GPU pages has been in the 50% range for a while.

Instead of expecting a third party to be precise in their language, we can go to Intel and Microsoft's pages on the topic itself: https://www.intel.com/content/www/us/en/support/articles/000020962/graphics.html

It's not clear if this is a Windows limit, or Intel, but since the HD 5300 (2014), this has been the case.

From what I understand that is an OS limitation. Intel offers APIs to convert regular allocations to GPU-accessible ones (much like Apple does), no idea whether those also count towards this limit. I wonder what happens if one tries to allocate more.

Apple doesn't seem to have a practical GPU memory limit — I had no trouble allocating buffers much larger than the system RAM size. Metal reports a value documented as "maximal recommended allocation size that won't affect performance", which on my machine is around 75% of the total RAM.

Edit: see post #76 for more details, you can allocate very large buffers, but you can't actually bind more than 75% of total RAM worth of data in a single pass.
AMD Strix Halo is here and interesingly they do things a little differently here:


(at least according to Tom's)

For instance, if you have 128GB of total system memory, up to 96GB can be allocated to the GPU alone, with the remaining 32GB dedicated to the CPU. However, the GPU can still read from the entire 128 GB memory, thus eliminating costly memory copies via its unified coherent memory architecture. However, it can only write to its directly allocated 96GB pool.
 
AMD Strix Halo is here and interesingly they do things a little differently here:


(at least according to Tom's)
I find their GPU chart a little confusing. I first thought it was comparing GPU ray tracing performance, but at the bottom it says Cinebench 2024 nT, which is number of threads aka the multi-core cpu score. Are they comparing gpu rt performance of cpu based ray tracing perf?

 
I find their GPU chart a little confusing. I first thought it was comparing GPU ray tracing performance, but at the bottom it says Cinebench 2024 nT, which is number of threads aka the multi-core cpu score. Are they comparing gpu rt performance of cpu based ray tracing perf?

I'm confused by that as well. At first I thought it was obvious that those would all be GPU comparisons, now I'm not so sure.

EDIT: They might even all be CPU? Especially on the Mac comparisons? VRAY's default is CPU, a lot of people still run Blender Classroom on the CPU. nT is usually the way to denote the multithreaded CPU results for Cinebench as you already noted. And Corona is purely CPU from what I can tell.

VRAY doesn't even work on the Mac GPU according this:


And the requirements of SSE2 compatibility definitely make it sound non-native or at least non-optimized.

Edit2: V-Ray and Corona appear to be AS native but not sure how well optimized they are:


 
Last edited:
I'm confused by that as well. At first I thought it was obvious that those would all be GPU comparisons, now I'm not so sure.

EDIT: They might even all be CPU? Especially on the Mac comparisons? VRAY's default is CPU, a lot of people still run Blender Classroom on the CPU. nT is usually the way to denote the multithreaded CPU results for Cinebench as you already noted. And Corona is purely CPU from what I can tell.

VRAY doesn't even work on the Mac GPU according this:


And the requirements of SSE2 compatibility definitely make it sound non-native or at least non-optimized.
I think you are correct and they’re using cpu scores to boast about ray tracers. I know Apple has put a lot of work into the Blender gpu renderers. I’m not sure how much attention the arm cpu renderer has received.
 
I think you are correct and they’re using cpu scores to boast about ray tracers. I know Apple has put a lot of work into the Blender gpu renderers. I’m not sure how much attention the arm cpu renderer has received.
Yeah that's very disappointing. With the new integrated GPU I, apparently naively, thought that's what they would've focused on even when comparing against the Mac. Oh well, I'm sure reviews will come out soon enough. Having said that, the CPU results wrt CB R24 fall pretty well within my original predictions of being roughly M4 Pro-tier. Which is interesting. It has the multi-die structure of the desktop CPU dies but so far no word on which process, N4P or N4X it is using (there's also AMD's Fire Range which again has the multi-die structure, but again not clear if it is the same as the Strix Halo in what process node it is using and which one that might be). Fascinating that the GPU is essentially on the IO die. Again I wonder what process that is using if true since on the Desktop anyway the IO die was on an older N6 process.

EDIT: Techpowerup is reporting that Strix Halo's SOC die (I/O, NPU, GPU) is 5nm (but not necessarily which 5nm node it is)

 
Last edited:
Back
Top