I forgot about BobI assume he meant Bob.
I forgot about BobI assume he meant Bob.
Hah, I was just going to say that it's 2003 all over again, and then I read your next sentenceIn other related news, during the the company's Tech Tour, Intel has strongly hinted that 13th-gen Molten Lake will hit 6Ghz under mysterious circumstances and unknown wattages. This is conveniently 300Mhz higher than Zen 4. Team Blue also hit 8Ghz, showing off a world overclocking record. Remember when NetBurst was supposed to hit 10Ghz back in 2005?
It depends. If Intel is targeting the high-end productivity crowd, then more E-cores will help. If they are targeting gamers, then thus far, the E-cores have been worthless. I'm not in the market for Intel's tarpit chips, but if I were, then I'd go for the ones which forgo the E-cores, and feature only P-cores. Intel has to convince content creators that they need a crazy number of middling cores, while convincing gamers to ignore those same cores, to concentrate on the P-cores, which will primarily be shoved into the top-end "K" SKUs. Captive fanboys will eat it up, but I suspect most other PC enthusiasts will be looking at Zen 4 with V-Cache. It's not a good place to be in for Intel, and that doesn't include power consumption, as you have rightly pointed out.Adding more 'E' cores is a genuinely good decision, though. I think.
then thus far, the E-cores have been worthless.
I wasn't aware of that. I recall Gamer's Nexus finding little benefit, if I recall correctly, but Hardware Unboxed does quality work. I wonder how much of this has to do with it simply being new technology. I remember Intel recommending the use of Win11 because of enhancements to the scheduler, which also happened to handicap AMD's CPUs in the process. I'm not a believer in conspiracies, I think it's more likely that Microsoft botched something, as is tradition.Not really, hardware unboxed shows the difference between on/off and they do make a difference.
Yup. Hanlon's razor is the best razor.I remember Intel recommending the use of Win11 because of enhancements to the scheduler, which also happened to handicap AMD's CPUs in the process. I'm not a believer in conspiracies, I think it's more likely that Microsoft botched something, as is tradition.
It does appear that Intel, at least, are doing this for a reason other than as a way to process background tasks for energy efficiency, like Apple is doing. If that were the case, then the 13900K wouldn't have an 8+16 implementation, 24C/32T.
Apple has thus far done the opposite, with performance M1 models having fewer e-cores, not more. As has been pointed out, Apple hasn't seen value in SMT, either.
The thing is, Intel's E cores have a totally different philosophy than Apple's E cores. Apple's E cores come from the efficiency focused iPhone and Apple Watch (the E cores are the main cores in the S7/S8 SoC). Designed to be as efficient as possible, so the watch can have longer battery life (performance is not that much of a concern in the Watch, you can't even install Geekbench on it ), and the iPhone can have something super efficient for background tasks where performance does not matter so those use the least possible amount of juice.I admit that I haven't been following the "little cores" saga as much as other product features. It's not exactly the most exciting implementation. I think the 8+16 arrangement in high-end Raptor Lake is a bit nuts. Even if the efficiency cores help with some games, I can't see this making a huge difference, yet that is how Intel is marketing it. Again, I apologize for not recalling the exact details, this is hardly my area of expertise, but @leman explained why Intel needs to use this efficiency cores implementation, instead of just continuing to use only performance cores. Also, I believe the rumor is that Zen 5 will adopt the same strategy, with AMD using a variant of Zen 4 as the little cores.
It does appear that Intel, at least, are doing this for a reason other than as a way to process background tasks for energy efficiency, like Apple is doing. If that were the case, then the 13900K wouldn't have an 8+16 implementation, 24C/32T. Apple has thus far done the opposite, with performance M1 models having fewer e-cores, not more. As has been pointed out, Apple hasn't seen value in SMT, either.
This. It’s all about the power/performance curve. Intel’s philosophy tends toward the P cores being way out past the knee, where small performance improvements cost big power consumption changes. Then their E cores are somewhere around the knee.The thing is, Intel's E cores have a totally different philosophy than Apple's E cores. Apple's E cores come from the efficiency focused iPhone and Apple Watch (the E cores are the main cores in the S7/S8 SoC). Designed to be as efficient as possible, so the watch can have longer battery life (performance is not that much of a concern in the Watch, you can't even install Geekbench on it ), and the iPhone can have something super efficient for background tasks where performance does not matter so those use the least possible amount of juice.
Intel's E cores are a whole different beast. Those are not really efficiency focused cores, nor are designed with the lowest possible power consumption in mind. Intel's E cores are actually much closer to Apple's P cores than to Apple's E cores. They're more of a 'middle core' thing, really.
From the looks of it, Intel's most performant core designs started getting quickly diminishing returns a few years ago. To compensate for it, more die area and power was thrown at the problem, managing to slightly increase the single core performance every year. But since they are getting diminishing returns, the P cores are now HUGE, and use a lot of power to extract those final bits of performance. With Alder Lake, they realised that the P cores were using so much die area that they throw in many of the less performant cores they were using for the Intel Atom line in the same die area as a few P cores.
But the Gracemont (Alder Lake's E) cores are not really that small, in fact there are PCs shipped with Atom processors that only have Gracemont cores, while it would be unthinkable for a Mac to ship with Apple's E cores only (not even the iPhone has E cores only).
I don't think it's a bad idea. After all, if you can parallelize a task into 8 threads, chances are it's trivially parallelizable to 32 cores too. There are practical limits to this, of course, it's not like they can double the number of cores every year.
we see that in the M2 vs. M1 benchmarks, where the E-cores got a lot of performance improvement without costing a lot of power consumption
What I want to know, somewhere there was talk of the specs for the LaBrea Lake processor, saying the P-cores would have "turbo boost" to 6GHz: what exactly does one do with that?
Perusing through the M1 reverse-engineering article I found, the author says that a L3 miss to DRAM can cost a complete fill of the 600+ entry ROB before the offending op can be retired, and that is at about half of 6GHz. Thus, the x86 P-cores at "turbo" would probably do a fair bit of stalling, even with very good cache load speculation.
I compare it with the city driving style I was taught, where you drive through downtown at about 5 under the speed limit, getting to almost every traffic light just after it turns green while the car next to you leaps off every line and has to brake hard because they hit every red, and at the edge of town, at the last light, you blow past them because it is quicker to accelerate from 20 to 45 than from a stop.
What I mean is that Apple gets the work done at a steady 3.2 while the other guys are in a sprint-stall-sprint-stall mode that is slightly less productive, but they have to do it that way because "5.2GHz" and "TURBO Boost" are really cool and impressive sounding (i.e., good marketing).
I'm about to ask an entirely unfair question, so feel free to tell me to sod off if you'd like. Most folks here are aware that I just purchased a Mac Pro, I've got an entire thread where I've been bloviating about it, so it's hard to miss. So, unless something breaks, my next Mac is probably going to have something like an M6 or M7 inside it, which is difficult for me to wrap my head around. I don't expect any specifics this far out, divining the basics of the M3 is already hard enough, but do you have any expectations or see any general trends for where Apple may take the M-series in the next half-decade? Like I said, unfair question, but I figured it wouldn't hurt to ask.Apple’s P-cores are around the knee, and their E-cores are around the linear region. Recently their E-cores have been moving up the curve, or, more accurately, they have been gaining the ability to operate higher in the linear region (we see that in the M2 vs. M1 benchmarks, where the E-cores got a lot of performance improvement without costing a lot of power consumption).
I'm about to ask an entirely unfair question, so feel free to tell me to sod off if you'd like. Most folks here are aware that I just purchased a Mac Pro, I've got an entire thread where I've been bloviating about it, so it's hard to miss. So, unless something breaks, my next Mac is probably going to have something like an M6 or M7 inside it, which is difficult for me to wrap my head around. I don't expect any specifics this far out, divining the basics of the M3 is already hard enough, but do you have any expectations or see any general trends for where Apple may take the M-series in the next half-decade? Like I said, unfair question, but I figured it wouldn't hurt to ask.
Over at TOP there is a thread linking to an article about how Apple is migrating its CPs over to RISC-V from embedded ARM, in order to reduce licensing costs. Seems a bit baffling to me: given that there is software out there that can take a high-level-language program and build you a dedicated circuit that performs a specifc range of tasks (e.g., image processing, data compression, etc) far faster and more efficiently than a program running in some sort of core, so why would they not do it that way, especially with the attendant improvement in battery life? All the heterogenous units are obfuscated behind APIs, so in theory, huge swaths of OS services could be accelerated just by pulling the functions out of software libraries and baking them in.I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software.
Monolithic die or chiplet?Obviously it would be complete speculation, and it’s not really something I’ve even thought about. But that many generations out, if I had to guess, I think, at least for Macs, we’d see much bigger packages with much more powerful GPUs that live on their own die. I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software. Maybe there will be three levels of CPU, maybe there will be much bigger ML components, etc. More and more of the auxiliary die are going to end up in the package. Really, I expect the packaging to become as important as the die. Bandwidth is Apple’s focus, and they will find ways to get data into and out of the package much faster, and across the package faster too.
I would guess, for economic reasons, multi-chip packages. (I refuse to say ”chiplet”). Cerberus is pretty interesting, though, isn’t it? Lot of my friends over there. I stopped by and looked around a couple years ago when they were all squeezed into a little office behind Lulu’s in Los Altos.Monolithic die or chiplet?
Over at TOP there is a thread linking to an article about how Apple is migrating its CPs over to RISC-V from embedded ARM, in order to reduce licensing costs. Seems a bit baffling to me: given that there is software out there that can take a high-level-language program and build you a dedicated circuit that performs a specifc range of tasks (e.g., image processing, data compression, etc) far faster and more efficiently than a program running in some sort of core, so why would they not do it that way, especially with the attendant improvement in battery life? All the heterogenous units are obfuscated behind APIs, so in theory, huge swaths of OS services could be accelerated just by pulling the functions out of software libraries and baking them in.
IOW, the article looks like fud, to me.
Thinking about this further, both @Yoused and @Cmaier have explained to me why Apple may see value in implementing SMT with their E-cores, but it makes little sense with the P-cores. From what folks have said here, x86 by nature, can benefit from SMT much more than RISC ISAs. (I won't rehash that discussion here, it's buried somewhere in the x86 vs. Arm thread.) In the back of my mind, I did find it curious that IBM found value in SMT with POWER, implementing 8-way SMT, as you point out.IBM, though, does see value in it. POWER9 server CPUs had cores that could run 4 threads at a time, POWER10 has 8-way SMT. Presumably they have studied loads and properly outfitted the cores with enough EUs to make it worthwhile. But IBM is targeting high-level performance while Apple is mostly going for consumer-grade efficiency, so maybe SMT is better when you have a big straw in the juice.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.