Apple M5 rumors

What I find the most surprising is the substantial boost in IPC (looks like 5-10%). What's more — it doesn't seem like A19 has the same boost.
Yes. I can’t recall where unfortunately, but there was some speculation that the core might be different. Not sure it’s true but I can see why there is some uncertainty.
 
View attachment 37020

Latest addition to my NBC CB R24 data, the M5! Nothing too surprising - ST efficiency is a tad lower, MT efficiency is a tad higher, performance is of course higher. A small wrinkle with the efficiency estimate is that previous base M4s had 16GB of RAM, while NBC received the 32GB RAM M5 version, so the efficiency of the M5 will look a touch worse due to the increase in RAM and a 16GB M5 might use a little less power (so ST efficiency might actually be the same as the M4 and the MT efficiency might be even better).

I haven’t read this review but you raise a good point re: the amount of ram. I also haven’t seen any discussion of memory speed in efficiency discussions. The faster ram could explain some of the increases in power usage. When we look at the Geekerwan review for the A19/Pro, the Spec tests show a difference in power usage between the two and a smaller difference in scores for both int and fp. Seemingly down to ram speed.
 
A small wrinkle with the efficiency estimate is that previous base M4s had 16GB of RAM, while NBC received the 32GB RAM M5 version, so the efficiency of the M5 will look a touch worse due to the increase in RAM and a 16GB M5 might use a little less power (so ST efficiency might actually be the same as the M4 and the MT efficiency might be even better).


Do they explain their methodology for measuring power consumption? Because I can't see anything.
 
Yes. I can’t recall where unfortunately, but there was some speculation that the core might be different. Not sure it’s true but I can see why there is some uncertainty.

My hypothesis is that the M5 cores might have the trace caching enabled, unlike the A19. I've mentioning the new trace caching patents some time ago.
 
Do they explain their methodology for measuring power consumption? Because I can't see anything.

Under Power Consumption. The one I use is:

We also measure some benchmarks and load scenarios when the laptop is connected to an external monitor and the display is deactivated. The values determined here provide information about the efficiency of the computing hardware in the context of the benchmark results.

Where the benchmark in question is CB R24 and I subtract out idle to measure efficiency (they don't, they just report idle separately and when calculating efficiency they do it on total device power, including idle, pros and cons to each)

What I find the most surprising is the substantial boost in IPC (looks like 5-10%). What's more — it doesn't seem like A19 has the same boost.

Yes. I can’t recall where unfortunately, but there was some speculation that the core might be different. Not sure it’s true but I can see why there is some uncertainty.

My hypothesis is that the M5 cores might have the trace caching enabled, unlike the A19. I've mentioning the new trace caching patents some time ago.

Geekerwan had 8% and 4% Int/FP IPC improvements for the A19Pro over the A18Pro and slightly less for the M5 over the M4 in SPEC - 8% and 4% non-adjusted improvements but a smaller clock speed increase of only ~2.2%. Seems similar?





Maybe the regular A19 is different?
 

Under Power Consumption. The one I use is:



Where the benchmark in question is CB R24 and I subtract out idle to measure efficiency (they don't, they just report idle separately and when calculating efficiency they do it on total device power, including idle, pros and cons to each)







Geekerwan had 8% and 4% Int/FP IPC improvements for the A19Pro over the A18Pro and slightly less for the M5 over the M4 in SPEC - 8% and 4% non-adjusted improvements but a smaller clock speed increase of only ~2.2%. Seems similar?





Maybe the regular A19 is different?

Geekerwan’s M4 Geekbench scores are much higher than the average and their M5 scores shows a much smaller difference than the average. I looked at the M4 scores and around 1 or 2% score anywhere near 4000 gb points. The number of M5 scores over 4300 is much higher, over 25% iirc. Albeit with far fewer scores. I’m not sure whether they gave the M5 the LN2 treatment.
 
My hypothesis is that the M5 cores might have the trace caching enabled, unlike the A19. I've mentioning the new trace caching patents some time ago.

I’ll confess that I have never built a trace cache so I could be wrong, but I doubt that it’s a situation where trace cache hardware exists in A19 but is not enabled. Just running through my mind what I’d do if my boss said “make me a cache that can act as a trace cache or a regular cache depending on a fuse/switch/whatever” I’d have a hard time making anything other than a bad cache/bad trace cache. I suppose I’d optimize it for trace, and then pretend that, on A19, my “traces” consist of sequential strings of instructions of fixed length. But I can’t think of how to make it not degrade the “regular cache” behavior by a noticeable amount; maybe if I actually had to do it I could figure it out, though.

And I don’t think they put two separate caches on there and just disable one or the other - that would be a massive waste of space.
 
I’ll confess that I have never built a trace cache so I could be wrong, but I doubt that it’s a situation where trace cache hardware exists in A19 but is not enabled. Just running through my mind what I’d do if my boss said “make me a cache that can act as a trace cache or a regular cache depending on a fuse/switch/whatever” I’d have a hard time making anything other than a bad cache/bad trace cache. I suppose I’d optimize it for trace, and then pretend that, on A19, my “traces” consist of sequential strings of instructions of fixed length. But I can’t think of how to make it not degrade the “regular cache” behavior by a noticeable amount; maybe if I actually had to do it I could figure it out, though.

And I don’t think they put two separate caches on there and just disable one or the other - that would be a massive waste of space.
The claim is that they planned and designed to have both but initial implementations (A19) failed in some small way so they disabled use of it. Now with the M5 it's working.

I don't know if that's correct, but I think it accurately describes the idea.

(Edit: come to think of it, having both does seem massively complicated.)
 
I think my idea would be that there are some features that increase the IPC, but cost additional power, so they are not active on A19. It could be tracing, improved branch prediction, or it could be throttling the backend to conserve power. The cores might also be physically different. As to wasting space, can’t the trace cache use the same memory as the regular L1/L2 cache?

Just throwing ideas out there :) It’s likely that none of it makes sense.
 
I think my idea would be that there are some features that increase the IPC, but cost additional power, so they are not active on A19. It could be tracing, improved branch prediction, or it could be throttling the backend to conserve power. The cores might also be physically different. As to wasting space, can’t the trace cache use the same memory as the regular L1/L2 cache?

Just throwing ideas out there :) It’s likely that none of it makes sense.

It can use the same cache RAM in theory, but there is a big difference how the CAMs and addressing need to work. Among other things, traces can have variable lengths, which greatly complicates the design. Supporting both types of cache using the same hardware complicates all sorts of things, including the interface to the 2nd level cache, the cache replacement algorithm/hardware, etc. Even the cache tagging seems to me to be pretty different, so depending on how you store/access tags (probably using CAMs), things can be even more complicated when you try and support both types of caches.

One other issue is that I believe trace caches always cache post-decoded instructions. I am not sure whether Apple’s normal caches do that.
 
One other issue is that I believe trace caches always cache post-decoded instructions. I am not sure whether Apple’s normal caches do that.

That would definitely be the case for an x86 design, but I suspect it would not with ARM. Instruction decoding is so trivial that caching μops becomes no longer cost effective. A μop has to have the renames coded into it, so when you get back to running it again, those names have probably changed, so you have to redetermine your sources and destinations, which is a lot easier to do from the original op than from the μop. I think this is part of the dispatch stage, right after initial decode, so you really do not save much by recycling μops – they still have to go through dispatch.
 
That would definitely be the case for an x86 design, but I suspect it would not with ARM. Instruction decoding is so trivial that caching μops becomes no longer cost effective. A μop has to have the renames coded into it, so when you get back to running it again, those names have probably changed, so you have to redetermine your sources and destinations, which is a lot easier to do from the original op than from the μop. I think this is part of the dispatch stage, right after initial decode, so you really do not save much by recycling μops – they still have to go through dispatch.

Register renaming happens after decoding (at least every time I’ve done it). I designed the register renamer on UltraSparc IV. Of course RISC micro ops are a different thing than x86 micro ops, but the micro ops, themselves, refer to “architectural” registers. (Architectural, in this case, refers to ISA registers plus any “secret” registers that are only useable by micro-ops that execute sequentially within an ISA instruction and need to pass results to each other without affecting the ISA register file; it’s not always the case that such a mechanism exists. Sometimes your micro-ops have to store a scratch register’s contents and the load the contents back into it when they are done, depending on the design.

The scheduler handles register renaming based on the current set of in-flight instructions, which the decoder would have no idea about. The scheduler keeps track of all that, and, as part of that, keeps track of register renaming.
 
By the way, something I am very curious about is the massive improvement in memory bandwidth. That must be the newer 9.6 Gbps LPDDR5T, right? Kind of unusual for Apple, they tend to be more conservative when it comes to RAM standards...
Yeah, the math works out: 120 x 9600/7500 =153.6.

But according to this post on Reddit by user -protonsandneutrons- a day ago, 9600 can now be considered a mature variant:

The rumored DRAM M5 uses is LPDDR5X-9600, while M4 used LPDDR5X-7500.

...9600 is a very old (read: mature) bin, tbh.


  1. LPDDR5X-9600, when it first launched as LPDDR5T by SK Hynix in Jan 2023, was fabbed on the 1α node. LPDDR5X has already shifted past 1β to now 1γ, so we're well into major node improvments and I could imagine Apple pays for the newest DRAM nodes with the lowest power consumption.

The poster goes on to say that it's LPDDR5X-10700 that's the very new variant.


Since the M4 Pro/Max use LPDDR5X-8533, the improvement with LPDDR5X-9600 on the M5 Pro/Max will be only 9600/8533 =12.5%, rather than the 9600/7500 = 28% we get with the base M5 (for the same memory configurations).

If they were to instead use LPDDR5X-10700 on the M5 Pro/Max, that would give an improvement of 10700/8533 = 25.4%. But, for the reason you noted, that seems somewhat unlikely.

LPDDR6 won't be commercially available until 2026, so we probably won't see that until M6 or M7.

Source:
 
Last edited:
Yeah, the math works out: 120 x 9600/7500 =153.6.

But according to this post on Reddit by user -protonsandneutrons- a day ago, 9600 can now be considered a mature variant:

The rumored DRAM M5 uses is LPDDR5X-9600, while M4 used LPDDR5X-7500.

...9600 is a very old (read: mature) bin, tbh.


  1. LPDDR5X-9600, when it first launched as LPDDR5T by SK Hynix in Jan 2023, was fabbed on the 1α node. LPDDR5X has already shifted past 1β to now 1γ, so we're well into major node improvments and I could imagine Apple pays for the newest DRAM nodes with the lowest power consumption.

The poster goes on to say that it's LPDDR5X-10700 that's the very new variant.


Since the M4 Pro/Max use LPDDR5X-8533, the improvement with LPDDR5X-9600 on the M5 Pro/Max will be only 9600/8533 =12.5%, rather than the 9600/7500 = 28% we get with the base M5 (for the same memory configurations).

If they were to instead use LPDDR5X-10700 on the M5 Pro/Max, that would give an improvement of 10700/8533 = 25.4%. But, for the reason you noted, that seems somewhat unlikely.

LPDDR6 won't be commercially available until 2026, so we probably won't see that until M6 or M7.

Source:

I don't have a reddit account, so can't respond directly. This has nothing to do with what theorist9 posted, which was only about whether 9600 RAM is a mature variant, but Protonsandneutrons did make a couple of mistakes (though to be fair NBC doesn't spell out its methodology in every article):

Even more problematically, NBC somehow ignores that the M5 MBP uses a different (or at least differently tuned) panel than the M4 MBP, so battery life measurements may not be purely a CPU / SSD difference, but also a display difference. Displays often can significantly alter energy consumption (as can SSDs).

That's why NBC tests with the built-in display off and connected to an external display - also in the original review article you can find the idle power the laptop tested.

With regard to changes in RAM:

Watts would be difficult to believe; LPDDR5X power consumption is measured in milliwatts.

Even NBC themselves noted the change in RAM could've affected power draws, but mostly because there was double of it compared to last year's review models (which had 16GB, this one had 32GB). When corrected for that, the total energy might be very much the same. Then there's the extra energy the bandwidth itself uses on the chip.

Problematically, NBC did not test energy, which is what they should test to correlate the CPU and the battery life. Batteries store energy, not power. That is how race to idle is theorised to work: more power for a short burst so you can then turn the CPU basically off.

You can match watts with performance to get a sense of this and NBC does report efficiency in terms of pts/watts which for a test with a set length is equivalent to reporting joules for a speed test, but they are all in separate charts. That's why I prefer my X-Y bubble charts to the disaggregated bar charts in their articles. It makes the relationship between performance and power of each processor much more clear (the idea coming from Andrei of Anandtech - though sadly all those articles have now been disappeared unless the Internet Archive got them or people save offline variations).

=============

Finally in the article itself, at the very bottom, is a comment left by yours truly that after subsequent analysis and thought, is probably wrong. I was thinking they and bunch of reviewers had tested the first native Mac version of CP2077 because some the data seemed to suggest it was, but later I decided nope, that really is the newest version with the SSRQ fix. There is a data point from the Ars Technica review that I have questions about, but by and large this represents the current best state of the game on Mac. Unfortunately.
 
Last edited:
I don't have a reddit account, so can't respond directly, but Protonsandneutrons did make a couple of mistakes (though to be fair NBC doesn't spell out its methodology in every article):
I paid no attention to the rest of Protonsandneutrons' post—I only shared their comments on the maturity status of LPDDR5X-9600 RAM. Were there any mistakes there?

When you responded to my post with a listing of the mistakes in protonsandneutrons' post generally, you potentially give the misimpression that I was sharing incorrect info., which I'm not aware I was. [Yes, I gave a link to the post as a whole, but that was required to ensure proper attribution.]

If you wished to address portions of that post I didn't quote, I think it would be better if you started your post with something like "This has nothing to do with what theorist9 posted, which was only about whether LPDDR5X-9600 RAM is a mature variant, but I noticed some mistakes in other parts of protonsandneutrons' post...."
 
Last edited:
I paid no attention to the rest of Protonsandneutrons' post—I only shared their comments on the maturity status of the 9600 RAM variant. Were there any mistakes there?

When you responded to my post with a critique of the remainder of Protonsandneutrons' post, you potentially give the misimpression that I was sharing incorrect info., which I'm not aware I was. I think it would be better if you started your post with something like "This has nothing to do with what theorist9 posted, which was only about whether 9600 RAM is a mature variant, but....."
Uhm … okay. I will edit my post.
 
With regard to ram and power usage, I’d just like to offer this screenshot from Geekerwan’s A19/Pro review

We can see the Spec17 tests on the A19 and A19 Pro for both int and fp. In both cases the A19 Pro uses over 1 watt more power. The main difference being the amount and speed of the ram. The A19 using 8GB of 8533MT/s vs the A19 Pro having 12GB of 9600MT/s ram.

1761397114424.png
 
Back
Top