M3 core counts and performance

Just saw this. Not sure what GF DGEMM is, or how competitive 800 Gflops of it is, but they seem impressed! Anyone know how compares to other cpus?
Looking online at some DGEMM code, this is a double-precision matrix multiplication (DGEMM = Double-precision, GEneral Matrix-Matrix multiplication). Thus GF DGEMM is the rate at which this is executed in gigaflops (GF)


Specifically, it benchmarks the rate at which this operation is performed:

C:=alpha*A*B+beta*C,

where alpha and beta are double-precision numbers, and A, B, and C are square matrices whose elements are double-precision

The comments say "
On exit, the array C is overwritten by the m by n matrix
*> ( alpha*op( A )*op( B ) + beta*C ).
"

I didn't look over the code that closely, but I'm guessing the reason they do C:=alpha*A*B+beta*C instead of D=alpha*A*B+beta*C is so that the code runs as a loop, continuously updating the value of C, thus allowing for a much longer run time than they would get with a single operation.

I found these measurements for other processors, but if the performance depends on the size of the matrices (I don't know if it does), these results may not be comparable if a different set of matrices were used:
 
According to this M1 Max is already supposed to get close to 800 GFLOPs in DGEMM, would hope M3 were faster?

So much about Apple Silicon performance seems to be contradictory, wrong or misleading. Even from people working within Apple. It’s very frustrating.
 
So much about Apple Silicon performance seems to be contradictory, wrong or misleading. Even from people working within Apple. It’s very frustrating.
Honestly that’s true for just about every bit of tech. Possibly more true for Apple Silicon since there were fewer eyeballs on Macs in general prior and now that more and more people are paying attention there are more people to post “hey this cool new thing about Apple Silicon” which isn’t new, just new to them (and to be fair, often us too, until someone digs further).
 
Honestly that’s true for just about every bit of tech. Possibly more true for Apple Silicon since there were fewer eyeballs on Macs in general prior and now that more and more people are paying attention there are more people to post “hey this cool new thing about Apple Silicon” which isn’t new, just new to them (and to be fair, often us too, until someone digs further).
I’m sure that’s correct, and yet I do still have issues with the lack of communication from Apple. One recent example: I wanted to know the 4k hevc decode performance for the hardware decode block on Apple Silicon. To my knowledge there is no public information on this. You can run benchmarks to get pretty close, but still you don’t know if your benchmark is performing to the maximum possible performance or do you have a bug in your code? With Nvidia/Intel/Amd the stuff is public and readily available on data sheets for their hardware.

I don’t want to get into it again, but the entire “Whisper performs better on an M1 Pro than a 4090 with MLX” fiasco is very strange to me. Yes I know the much faster whisper on the 4090 isn’t exactly the same and uses batching, but these are highly paid Apple employees and I find it weird that they either don't know the 4090 has a faster option or they wilfully ignored it. It plays into the “insular Apple” narrative which doesn’t serve them.
 
I’m sure that’s correct, and yet I do still have issues with the lack of communication from Apple. One recent example: I wanted to know the 4k hevc decode performance for the hardware decode block on Apple Silicon. To my knowledge there is no public information on this. You can run benchmarks to get pretty close, but still you don’t know if your benchmark is performing to the maximum possible performance or do you have a bug in your code? With Nvidia/Intel/Amd the stuff is public and readily available on data sheets for their hardware.

I don’t want to get into it again, but the entire “Whisper performs better on an M1 Pro than a 4090 with MLX” fiasco is very strange to me. Yes I know the much faster whisper on the 4090 isn’t exactly the same and uses batching, but these are highly paid Apple employees and I find it weird that they either don't know the 4090 has a faster option or they wilfully ignored it. It plays into the “insular Apple” narrative which doesn’t serve them.
Yeah I should’ve included that for why it’s even more true with Apple Silicon and Apple products overall. Apple’s penchant for general secrecy and vagueness about its products. You could almost put “communications” in your list of things Apple can do to catch Nvidia. That’s griping I’ve read from professionals using Apple products for years and years, basically back to the old days.
 
So much about Apple Silicon performance seems to be contradictory, wrong or misleading. Even from people working within Apple. It’s very frustrating.

Apple's documentation and developer communication are simply atrocious. Sometimes it seems like they don't have anyone responsible fr these things and just do the bare minimum. Most of the current documentation appears to be generated from the DocC comments. The Metal shading language reference is very obviously a poorly formatted word file that doesn't even have hyperlinks and leaves many things unexplained. Apple used to have these great, well organized articles detailing how their systems and frameworks function (still accessible here: https://developer.apple.com/library/archive/navigation/). Now it's an either cryptic one-sentence comments without explanation or WWDC tiktoks. Isn't it ridiculous that I have to watch an awkward video to learn how mesh shaders work? This is a cultural shift I do not like even a bit. And even Apple's devs seem confused about this stuff, judging on some comments I encountered in MLX metal shaders...
 
Apple's documentation and developer communication are simply atrocious. Sometimes it seems like they don't have anyone responsible fr these things and just do the bare minimum. Most of the current documentation appears to be generated from the DocC comments. The Metal shading language reference is very obviously a poorly formatted word file that doesn't even have hyperlinks and leaves many things unexplained. Apple used to have these great, well organized articles detailing how their systems and frameworks function (still accessible here: https://developer.apple.com/library/archive/navigation/). Now it's an either cryptic one-sentence comments without explanation or WWDC tiktoks. Isn't it ridiculous that I have to watch an awkward video to learn how mesh shaders work? This is a cultural shift I do not like even a bit. And even Apple's devs seem confused about this stuff, judging on some comments I encountered in MLX metal shaders...
Yeah, it's puzzling when companies mess up on what seems to be no-brainer stuff. How many technical writers do you think Apple would need to hire to correct this?
 
Yeah, it's puzzling when companies mess up on what seems to be no-brainer stuff. How many technical writers do you think Apple would need to hire to correct this?

I think this is the issue of strategy and policy and not technical writing as such. They could have hundreds of technical writers, but it won't do them any good unless there is a competent manager who is passionate about improving documentation. It's like the Apple's developer forums — there are Apple devs looking at them, but you never know if you get an answer. It just depends on whether someone bothers to answer. And I have the feeling the current state of documentation follows the same basic approach.

Regarding your question: I think just one motivated person per topic, working exclusively on documentation, will be able to do a lot. I mean, give me access to GPU driver team and I am confident I could do a compelling rewrite of the Metal shading language document in six months.
 
I think this is the issue of strategy and policy and not technical writing as such. They could have hundreds of technical writers, but it won't do them any good unless there is a competent manager who is passionate about improving documentation. It's like the Apple's developer forums — there are Apple devs looking at them, but you never know if you get an answer. It just depends on whether someone bothers to answer. And I have the feeling the current state of documentation follows the same basic approach.
Sure--that goes without saying that you'd need management buy-in, and was actually the assumption underlying my question, which perhaps I should have made more explicit: Assuming Apple management wanted to fix this, and the devs were too busy with the AS transition to do this on their own, what kind of resources would Apple need to throw at this (i.e., tech writers) to get it done.

At the same time, you'd think they'd have a least a few senior tech writers who were passionate enough about their craft that they'd take the initiative to go to management and say: "Listen, our external documentation is in a sorry state. I'd like to see us fix that." [Which some may have done at Apple.] I've done that sort of thing in large organizations I've been part of, and found managment could be surprisingly responsive. So the passion doesn't necessarily need to come from management in order to get something done.
 
Is anyone familiar with the Xcode IDE? I suspect most of the documentations are already built into Xcode and those online are used for high level explanations?
 
Is anyone familiar with the Xcode IDE? I suspect most of the documentations are already built into Xcode and those online are used for high level explanations?
Nope 🙃

What you see online is exactly the same documentation. I'm trying to work with MusicKit right now and it's amazing how awful it is compared to the older music-related APIs in MediaPlayer.
 
Sure--that goes without saying that you'd need management buy-in, and was actually the assumption underlying my question, which perhaps I should have made more explicit: Assuming Apple management wanted to fix this, and the devs were too busy with the AS transition to do this on their own, what kind of resources would Apple need to throw at this (i.e., tech writers) to get it done.

At the same time, you'd think they'd have a least a few senior tech writers who were passionate enough about their craft that they'd take the initiative to go to management and say: "Listen, our external documentation is in a sorry state. I'd like to see us fix that." [Which some may have done at Apple.] I've done that sort of thing in large organizations I've been part of, and found managment could be surprisingly responsive. So the passion doesn't necessarily need to come from management in order to get something done.

The thing is, Apple used to have a more comprehensive documentation, but they completely overhauled it around 2018-2019. Since then they've been using DocC for documentation (instead of actual written documentation), the amount of useful text has shrunken down, and the website has been rebuild in the way that makes documentation discovery really awkward. The only thing I really like is the API diff. My guess is that they wanted to streamline how they create documentation, and DocC was their way to do it. The idea is not bad as such, unfortunately, the way it was implemented leaves much to be desired.
 
The thing is, Apple used to have a more comprehensive documentation, but they completely overhauled it around 2018-2019. Since then they've been using DocC for documentation (instead of actual written documentation), the amount of useful text has shrunken down, and the website has been rebuild in the way that makes documentation discovery really awkward. The only thing I really like is the API diff. My guess is that they wanted to streamline how they create documentation, and DocC was their way to do it. The idea is not bad as such, unfortunately, the way it was implemented leaves much to be desired.


Apple's documentation and developer communication are simply atrocious. Sometimes it seems like they don't have anyone responsible fr these things and just do the bare minimum. Most of the current documentation appears to be generated from the DocC comments. The Metal shading language reference is very obviously a poorly formatted word file that doesn't even have hyperlinks and leaves many things unexplained. Apple used to have these great, well organized articles detailing how their systems and frameworks function (still accessible here: https://developer.apple.com/library/archive/navigation/). Now it's an either cryptic one-sentence comments without explanation or WWDC tiktoks. Isn't it ridiculous that I have to watch an awkward video to learn how mesh shaders work? This is a cultural shift I do not like even a bit. And even Apple's devs seem confused about this stuff, judging on some comments I encountered in MLX metal shaders...

You having to contact Apple Developer Forums to get what should’ve been easy to find limits on the Apple Metal Shader resource bindings if memory serves is a good example.

I remember 10 years ago reading Apple (and very old Windows) documentation to figure out how to fix Wine’s OS X joystick implementation. It was hard and confusing mostly because of my own unfamiliarity with anything OS framework but I did it. It sounds like I would be even more in the dark for anything newer.

New developer experience is so crucial …
 
I think this is the issue of strategy and policy and not technical writing as such. They could have hundreds of technical writers, but it won't do them any good unless there is a competent manager who is passionate about improving documentation. It's like the Apple's developer forums — there are Apple devs looking at them, but you never know if you get an answer. It just depends on whether someone bothers to answer. And I have the feeling the current state of documentation follows the same basic approach.

Agreed on the bolded bit especially. One thing I've seen a lot of in my end of the "large bureaucratic developer" world is the desire to "get lean". Something Apple has talked about at length as well. The problem is that lean means making the developers own more and more things, and wear more hats. So of course a company trying to "get lean" will make the developers the technical writers and have documentation auto-generated from the source code.
 
Annotated M3 die shots by Twitter X user High Yield.

For a sharper image than that displayed here (this site seems to compress images), go to:

Locuza's annotated M1 die shots shown below for comparison (https://twitter.com/Locuza_/status/1450271726827413508/photo/1)

As others have noted, the M3 Max is not a Pro with certain parts doubled (mirror-imaged). Correspondingly, instead of having two Pro NPUs at opposite ends of the chip (as seen on the M1 Max), it has a single larger NPU. Likewise, the Display Engines are clustered together on the I/O end of the chip, instead of being separated.

I'm confused about the number of Display Engines, since in each case those are equal to the number of external displays the chip supports. The total number of supported displays is one more than that, so did he miss one? Assuming it is one Display Engine per external, it does take up a decent percentage of the M3's die area, so perhaps Apple's decision not to allow that chip to drive a total of three isn't purely product segmentation.

ADDENDUM: According to @mr_roboto, he did indeed miss the internal display engine on each chip, which are less powerful than the external display engines, and thus don't look the same. See https://techboards.net/threads/m3-core-counts-and-performance.4282/page-21

View attachment 27195
View attachment 27196
He also released an interesting video about it too. The most interesting parts are towards the end where he compares the sizes of three main processors relative to the other parts of the die:



Basically concluding that most of the additional transistors from M2 to M3 appear to be going to the GPU cores. The CPU cores appear to be growing at the same rate as the die but the NPU is getting smaller as a proportion of the die area.

He also reiterates that Apple has always removed the interconnection from their die shots so its continued absence is not particularly notable. Has any third party released die shots? I don’t think Hector has done any kind of deep dive on the M3 yet, he says he waits for desktops (the mini) first. So while I’m pretty sure he has an M3 Max and I don’t think he’s looked at it to confirm if its die interrupt controller is the same or at all different. I know a couple of people here wondered if something had changed, and it might’ve of course, but as I stated earlier, I think we should assume that things have stayed the same unless we have evidence otherwise.
 

Thoughts on this? Seems like Apple would be deviating from the Ultra being 2x Max if so and I'm very curious what that'll look like.
Interesting, who is that? Are they a known source of good information? It’s odd that no one released 3rd party die shots by now. Someone should know for sure by now …

Anyway, if so, it would very likely mean that we’ll get a dedicated Ultra die whose characteristics we could only guess at and, most excitingly, it could mean a 2xUltra Extreme for the Mac Pro if the Ultra die has the bridge. That would be cool. Again, if this is true, I’m a little surprised that given the reported costs of this initial 3nm node that they would do this now (go for an even bigger monolithic die), but I’m sure Apple would have worked out the economics down to the last detail.

Other possibilities? Hmmm … maybe a bifurcation between laptop dies and desktop Max dies? After all the laptop ones don’t need the interconnect and maybe there is something about the desktop dies makes having a difference here worth it from an economic perspective. Don’t consider this as likely.
 
Last edited:
Interesting, who is that? Are they a known source of good information? It’s odd that no one released 3rd party die shots by now. Someone should know for sure by now …
Not really sure myself. I've definitely seem them a bit before on... what I'd call Apple Silicon Twitter? ¯\_(ツ)_/¯

Figured I'd post this for the discussion. According to someone in the replies the schematics for the chips were apparently obtained and sold on the dark web. Not sure how true that is but here we are.

Anyway, if so, it would mean that we’ll get a dedicated Ultra die whose characteristics we could only guess at and, most excitingly, it could mean a 2xUltra Extreme for the Mac Pro if the Ultra die has the bridge. That would be cool. Again, if this is true, I’m a little surprised that given the reported costs of this initial 3nm node that they would do this now (go for an even bigger monolithic die), but I’m sure Apple would have worked out the economics down to the last detail.
If we're getting 2x Ultra out of this move I'll be quite pleased. I hope the Ultra and 2x are exclusively for the Studio and MP respectively. I think it'd be weird to continue with the Ultra in the MP. I do wonder if they'll continue with 2x the core config of the Max with this move or if it'll be something like 1.5x CPU and 1.5x GPU.
 
Not really sure myself. I've definitely seem them a bit before on... what I'd call Apple Silicon Twitter? ¯\_(ツ)_/¯

Figured I'd post this for the discussion. According to someone in the replies the schematics for the chips were apparently obtained and sold on the dark web. Not sure how true that is but here we are.

I’m sure that rumor is false. I’d also bet there are no schematics for the chips. (There may be schematics for specific circuits, but the chip interconnectivity is probably in netlist form, not schematics).
 
Back
Top