Since there doesn’t seem to be much in the way of rumours, I’m going to list my wishes for A18/M4. They are:
1) CPU. A larger IPC increase than A16->A17. A17/M3 was a nice increase in performance, but iirc that was mostly a result of N5P to N3B. It would be great if this gen concentrates on IPC increases. If that is possible
I remember that you posted the following analysis that the IPC increases might be larger than was immediately apparent:
And in terms of the core breakdowns, that's 12P+4E for the M3 Max, vs. 8P+16E for the 13900KS. It's interesting how different Intel's and Apple's approach to core hybridization is. The high-performing M-series chips have always had more P-cores than E-cores, while the 12900K's (Intel's first...
techboards.net
I have to admit though I have reservations about it which I expressed at the time - some of the methods, results, and conclusions were odd to me - chiefly if memory serves that the power/clock speeds reported in the tests were very low. I know that the main thrust of his argument was that the peak frequencies are hardly ever reached thus the cores were actually operating with higher IPC than you might think, but these seemed off still.
Over at Anandtech,
Gerard Williams said he worked on Apple Silicon from A7 to A14 and M1.
There have been slow downs from A15. A16 was barely an improvement, performance increased due to node and higher clock. Its the same with A17.
I am extremely skeptical about the notion that Apple hit a wall because one man left the design team. It’s awfully reductive and facile. For one thing, E-cores have continued to improve generation-on-generation and maintaining low power with those performance improvements is no easy engineering task. So clearly someone designing CPU cores is still showing up for work at Apple. Secondly the M3 is the first actual new design and yeah so far, apart from the aforementioned analysis, its P-core IPC improvements don’t appear great. It’s important to remember that either the M3 was supposed to be the M2 or the M2 was just going to be a stop gap until the N3 node was substantially delayed and the M2 became a full generation. Finally the Nuvia core which GW3 worked on itself appears to be a reworked M1 with very similar characteristics. It’s not like his team came out of the gate and blew Apple away. If the loss of GW3 was really so impactful, then honestly we won’t see it for another few years if Apple continues to struggle to advance their P-and Nuvia/Qualcomm overtake them. So far other explanations seem far more likely.
Partly what I think is going on is that Apple is stretched more thinly than they were in the past. True, they're catching up as they were able to release 3 SOCs at once this time around, but Apple doesn't necessarily have the resources redesign and improve every piece of its SOC lineup every generation (especially these days an iPhone A-series generation updated every year), add in node delays with designs tied to those nodes, and some things are going to get updated less often.
With the regards to the CPU in particular, the E-cores have improved, but the P-cores have, seemingly, not in IPC. The issue with the P-cores in particular may be that the old tricks Apple used to get its spectacular rise in performance every generation which was seemingly every year simply stopped working all that well or delivered diminishing returns. To be reductive myself, they simply kept going wider and past 8-wide decode that may not provide as much benefit. Supposedly the new A17/M3 P-cores are 9-wide and I believe ARM has a X core at 10-wide? But we know that's not the end all be all of even a wide design, if it were and my memory about ARM being even wider is correct then they would have the best core and they don't. My main point is though that supposedly in code the average amount of parallel instructions per branch is about 8. Now depending on if that's median or mean and the characteristics of that distribution in (benchmarking) code I would think that continuing to go wide might still provide benefits, but one can also see how one might hit a brick wall going wider and wider if there is simply less ILP to squeeze out in most code.
Finally it could also just be a blip. Apart from the analysis above that claims we're all just measuring things wrong, this could be simply an underwhelming generation in otherwise promising direction that will get optimized and improved such that years from now we'll look back at forum discussions like this and shake our heads with a knowing smile. At the risk of repeating myself too many times: since the M1 we've only had two main generations (plus a third iPhone generation yes, but see note above for that), one of which was an optimized M1 likely because of the node delays. Given the pace of development of processors, this isn't something to worry about ... yet. Certainly not something to draw iron-clad conclusions from. Those things are best done in hindsight not at the time if at all.
3) In terms of gpu improvements. the M3 has seemingly laid a foundation for future improvements. Hopefully they can now step on the gas. I am intrigued by the possibility of allowing the ALUs to dual issue FP32, dual FP16 or dual INT, rather than the any-two-of-the-three, that the M3 currently employs. Even more enticing would be any three simultaneously!
4) New media engines. The current ones have served the M series well, but they are three years old now and Nvidia has caught up in speed largely while surpassing them in quality.
Thoughts?
I am more hopeful here. Apple's GPU team can make improvements here considering the recent hires.
Apple needs to AV1 Encode to the media engine in their Mac chips. If Apple cares about streaming its good to add it.
Most of my wished for GPU improvements are covered in
@Jimmyjames 's earlier thread. Since reviewing
@leman 's post on the
new Apple GPU L1 caches, I also think another area of improvement would be to increase their performance as well. However, that is no doubt an incredible challenge as that is generally the tradeoff with larger caches which often entails a slower one. It just takes longer to find and output the right piece of information. The characteristics of the new L1 are really quite interesting, and very different than the usual GPU shared memory characteristics, including Apple's own previous ones.
2) CPU. An ability to separate desktops from laptops. Does the Studio really need E cores? It would be preferable to have an all P-core desktop chip. Perhaps also the ability to scale frequency higher.
Yes, really, it does need them. If even only for housekeeping, E cores take a load off of the P cores. If you have a major job running that is going to take a while on the P cores, the E cores can handle whatever you are doing in the mean time without pushing so much heat onto the chip. And really, they have been getting much better with each gen. Apple probably has a new trick up their sleeve for M4 that no one is expecting.
Agreed about E-cores. While a desktop system may not need a bunch, having them is a good thing. That said I agree with
@Jimmyjames that overall a desktop oriented SOC wouldn’t be a bad thing. But I have to caveat my caveat here because, as I mentioned earlier, engineering is a finite resource and so focusing on mobile makes sense both for Apple and the overall market.
I suspect M4 will be on N3P (skipping over N3E), so it will be a lot like M1->M2 type advance. M5 is the one to watch out for. That will probably be on N2, and it will be able to control an entire starship.
Also N2 has GAA. M5 should be a nice uplift.
Agreed about M5/N2. Hopefully it will be on schedule. Backside power delivery on M6 will also be quite a nice boost.