Intel Lunar Lake thread

exoticspice1

Site Champ
Posts
328
Reaction score
132
“Lol, the recent events make me more convinced than ever that Eric Quinnell knows what he is talking about and uop caches are going the way of Hyperthreading.

Think about it. We had to back off from pipeline stages because adding stages is a guaranteed loss, while prediction is not.

Netburst architectures demonstrated that additional stages cost way more transistors than initially anticipated.

Since the process technologies keep getting better and better still, the idea is to cut the stages again:
-Which improves performance just by itself
-Simplifies design, thus less area and power use, thus more efficient
-Which allows for higher performance by using it elsewhere

Apple has the shortest pipeline at 9. It's that simple.”

Quote from Annadtrch forums.


Is that true @Cmaier ? He says Apple advantage in IPC is because Apple has the shortest pipeline. Where Intel’s skymont has 14 pipelines
 

Jimmyjames

Site Champ
Posts
867
Reaction score
999
I’m not sure they did actually … not in ST anyway - it's hard to know given what's reported (see below). Maybe they're competitive in MT perf/W though.

==========

Also don’t get me wrong, I think both Zen 5 and Lunar Lake look like really nice upgrades, but for Lunar Lake, particularly Skymont, this shit’s hilarious:







And for both Skymont and Lion Cove the results are apparently all simulated (hence the error bars). They don’t have testing of actual products. I think @Cmaier has said something about that in the past 😉. Now who knows? Maybe it’ll be just as good, maybe better!, in actual silicon but this yet another case where marketing takes something that is actually really damn cool, the new Skymont cores, and in my opinion mucks it up with weird comparisons that make it look desperate rather than awesome.




It gets worse ... see above.
Jeez, not sure what to say about that. The E-core comparison is disappointing as is the fact that the results are simulated. Yikes.
 

exoticspice1

Site Champ
Posts
328
Reaction score
132
I’m not sure they did actually … not in ST anyway - it's hard to know given what's reported (see below). Maybe they're competitive in MT perf/W though.

==========

Also don’t get me wrong, I think both Zen 5 and Lunar Lake look like really nice upgrades, but for Lunar Lake, particularly Skymont, this shit’s hilarious:







And for both Skymont and Lion Cove the results are apparently all simulated (hence the error bars). They don’t have testing of actual products. I think @Cmaier has said something about that in the past 😉. Now who knows? Maybe it’ll be just as good, maybe better!, in actual silicon but this yet another case where marketing takes something that is actually really damn cool, the new Skymont cores, and in my opinion mucks it up with weird comparisons that make it look desperate rather than awesome.




It gets worse ... see above.
My question is why simulate them? They have actual silicon to test against.
 

exoticspice1

Site Champ
Posts
328
Reaction score
132
What is the IPC difference between Skymont and M4, if we take Intels claim at face value that Skymont has +2% IPC than Raptor Cove.
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,621
Reaction score
9,238
“Lol, the recent events make me more convinced than ever that Eric Quinnell knows what he is talking about and uop caches are going the way of Hyperthreading.

Think about it. We had to back off from pipeline stages because adding stages is a guaranteed loss, while prediction is not.

Netburst architectures demonstrated that additional stages cost way more transistors than initially anticipated.

Since the process technologies keep getting better and better still, the idea is to cut the stages again:
-Which improves performance just by itself
-Simplifies design, thus less area and power use, thus more efficient
-Which allows for higher performance by using it elsewhere

Apple has the shortest pipeline at 9. It's that simple.”

Quote from Annadtrch forums.


Is that true @Cmaier ? He says Apple advantage in IPC is because Apple has the shortest pipeline. Where Intel’s skymont has 14 pipelines
well, this is the sort of thing that you do performance modeling on. We had entire teams doing that. (They always told me every change I asked for made a 2% difference. Didn’t matter what the change was.) My gut feel is a uop cache is a net positive on x64 and not at all necessary on a RISC architecture. But it depends on how big the cache is. I feel like the tipping point is that accessing the cache has to take fewer cycles than decoding, and has to have a very high hit rate (95%+). I don’t have a feel for how big a cache you need on x64 to get that hit rate. But anything more than 1024 entries probably takes more than 1 cycle to access. I don’t know how many pipe stages are dedicated to decode on current x64 chips. If it’s 4 cycles, i feel like cache will be worth it. If it’s 2, then probably not.

On RISC like ARM you can almost always decode in a cycle, so a cache does nothing useful.
 

Yoused

up
Posts
5,876
Reaction score
9,479
Location
knee deep in the road apples of the 4 horsemen
Think about it. We had to back off from pipeline stages because adding stages is a guaranteed loss, while prediction is not.

Intel added stages in order to perform decode overlay adjustments. With x86-64, you tune the decoder for the most frequent types of instructions, which are 3-5 bytes long, and adjust for the less frequent types (RTS is kind of a problem, since it is only one byte and rarely has a prefix). Fixed-length RISC grabs handfuls of Legos and tosses them in the hopper, so the pipeline does not really need to be very long.

x86 is more like scanning a string of text for delimiters and determining what they mean. To me, it seems like a "wide" x86 dispatch has a very different meaning, in which one of the decoders may not be issuing an op but instead an operand (like decoding SIB or an immediate or offset for an adjacent instruction). Hence, most of the extra pipeline stages in x86 are devoted to adjusting the pipeline stream to align with the code stream, something that is not necessary in almost any other modern architecture.
 
Top Bottom
1 2