Apple: M1 vs. M2

The Mac Studio with the M1 Max has 4 TB buses, hooked up to the rear ports, and a USB controller(s?) driving the front USB-C ports and the rear USB-A ports. The M1 Ultra has 8 TB buses, and so the front USB-C ports are TB-capable.
So the Max has four TB ports and four TB buses, while the Ultra has six and eight, respectively. Does this mean two of the TB busses on the Ultra aren't being utilized, or are the TB buses also used for the non-TB ports, in which case would the the Ultra offer more bandwidth per port than the Max when all ports are being utilized, because of reduced sharing?

I.e., I'm not sure how TB busses work but, on the Max, are the signals for the six non-TB ports (2x USB-C gen 2, 2 x USB-A 3.0, HDMI, SXDC) also routed through those four TB busses, and thus need to utilize some of their bandwidth (where, by contrast, on the Ultra, all six of those could be routed through two "surplus" TB busses), or do they interface with the chip through separate pathways?

I wonder if anyone has made a table of all the M-devices showing the number of TB buses and the ports utiilized by each.
 
Last edited:
TB and USB are both fast serial protocols that are convergent: SoC circuitry that handles TB probably also handles USB with minimal extra transistors. AIUI, USB4 is essentially indistinguishable from TB and mandates the C-type connector. It would seem that the SoC has generic serial data handlers, so if you see a C-type hole, it is wired to the serial block that can do either.
 
TB and USB are both fast serial protocols that are convergent: SoC circuitry that handles TB probably also handles USB with minimal extra transistors. AIUI, USB4 is essentially indistinguishable from TB and mandates the C-type connector. It would seem that the SoC has generic serial data handlers, so if you see a C-type hole, it is wired to the serial block that can do either.
It's more like... USB and TB are very different at the protocol level, but convergent at the physical layer. You can build a dual-mode PHY, but you're going to need fundamentally different stuff behind the PHY to handle both modes.

On the other hand, there's the semantics argument, which is that since TB is now kind-of a part of the USB4 spec, technically it's all USB now!
 
14” has 2, I guess, while the 16” has 3.
This seems like is a case of Apple squeezing a turnip. It pisses me off to pay top dollar for hardware and get a measly 2 USB ports. :mad:
 
This seems like is a case of Apple squeezing a turnip. It pisses me off to pay top dollar for hardware and get a measly 2 USB ports. :mad:
It's not true, he was mistaken. The 14" and 16" M1 MBP have exactly the same IO port types and counts, including USB.
 
So the Max has four TB ports and four TB buses, while the Ultra has six and eight, respectively. Does this mean two of the TB busses on the Ultra aren't being utilized, or are the TB buses also used for the non-TB ports, in which case would the the Ultra offer more bandwidth per port than the Max when all ports are being utilized, because of reduced sharing?

I.e., I'm not sure how TB busses work but, on the Max, are the signals for the six non-TB ports (2x USB-C gen 2, 2 x USB-A 3.0, HDMI, SXDC) also routed through those four TB busses, and thus need to utilize some of their bandwidth (where, by contrast, on the Ultra, all six of those could be routed through two "surplus" TB busses), or do they interface with the chip through separate pathways?

I wonder if anyone has made a table of all the M-devices showing the number of TB buses and the ports utiilized by each.

With Apple Silicon, Apple been using the 1 port = 1 bus approach. No TB ports get shared, unlike with Intel where Apple used a two-port Thunderbolt controller for each pair of ports, meaning 2 ports = 1 bus on Intel Macs. Do PCIe lanes get shared between TB buses? That I don’t know for certain, but I’m inclined to say no.

The SoC has a handful of dedicated PCIe lanes for off-die I/O as well. The M1 Mini has a single PCIe 4.0 lane for ethernet (which can handle 10Gbps ethernet no sweat), and a single PCIe 4.0 lane for the USB-A ports and WiFi. I know less about the lanes dedicated on the M1 Max in the Studio, but the SDXC slot would be using some of this PCIe bandwidth like in the MBP, even if as a USB device. HDMI on both the Studio and the Mini use a DisplayPort to HDMI adapter on the logic board, fed by the SoC’s DisplayPort PHY that is routed externally. This is the same DisplayPort PHY that would be used for the internal display on the Air or MBP.

For something like the M1 Max, you only really need about 12 PCIe 4.0 lanes (or less) to handle everything Apple does in the Studio, and only 4 of those would need to be routed off the die, with the rest leaving as Thunderbolt instead. AMD routes 24 lanes off the package for PCIe, M.2, and the logic board chipset, for example. Intel offers 16, IIRC. So it’s not like Apple’s pushing the limits or anything here.

It's more like... USB and TB are very different at the protocol level, but convergent at the physical layer. You can build a dual-mode PHY, but you're going to need fundamentally different stuff behind the PHY to handle both modes.

On the other hand, there's the semantics argument, which is that since TB is now kind-of a part of the USB4 spec, technically it's all USB now!

Not to mention that TB3/4 was defined as a USB-C alt mode. These sort of multi-mode PHYs (don’t forget DisplayPort’s alt mode too) are USB-C‘s bread and butter at this point. Intel’s TB controllers were fundamentally USB-C controllers with support for a few alt modes built into them.
 
It's not true, he was mistaken. The 14" and 16" M1 MBP have exactly the same IO port types and counts, including USB.
Previous to this discussion, my impression there are 3 USB ports, but I was too lazy to go look it up. I think it is the Air that has just 2. Thanks!
 
Previous to this discussion, my impression there are 3 USB ports, but I was too lazy to go look it up. I think it is the Air that has just 2. Thanks!
The M1 and M2 MacBook Air has 2 USB-C/Thunderbolt 3 ports. The 13" MacBook Pro also only has 2. The 14" and 16" have 3 USB-C/Thunderbolt 4 ports.
 
I.e., I'm not sure how TB busses work but, on the Max, are the signals for the six non-TB ports (2x USB-C gen 2, 2 x USB-A 3.0, HDMI, SXDC) also routed through those four TB busses, and thus need to utilize some of their bandwidth (where, by contrast, on the Ultra, all six of those could be routed through two "surplus" TB busses), or do they interface with the chip through separate pathways?
With Apple Silicon, Apple been using the 1 port = 1 bus approach. No TB ports get shared, unlike with Intel where Apple used a two-port Thunderbolt controller for each pair of ports, meaning 2 ports = 1 bus on Intel Macs.
I wasn't asking if the TB ports share TB busses, but rather if signals from the non-TB ports are also routed through the TB busses. It sounds like you're saying they're not, but in that case what does the Ultra do with its 8 – 6 = 2 surplus TB busses? Are they simply not used?
 
I wasn't asking if the TB ports share TB busses, but rather if signals from the non-TB ports are also routed through the TB busses. It sounds like you're saying they're not, but in that case what does the Ultra do with its 8 – 6 = 2 surplus TB busses? Are they simply not used?

Much like the 14”/16” MBP and their surplus TB bus, the extras go unused.

It’s easier and cheaper to hook up these sorts of device controllers over PCIe. Tunneling over TB doesn’t add anything other than cost and eat up space for the TB controller(s) you’d need for the USB and Ethernet controllers which are going to want PCIe lanes anyways.

(And I’m aware I went beyond what your specific question was. It was more meant to be a bit of an overview of the architecture as implemented)
 
New M2 Max geekbench scores:

Single Core - 2027
Multi Core - 14888

More respectable but who knows if they are real or not!
This lists a 3.68 frequency (the last "M2 Max" was 3.54; the production M2 in the 13" Pro and Air is 3.49). The variation in freq is consistent with these scores (if legit) being from preproduction devices.

Extrapolating what we'd expect the SC score to be from the clock speed and the 1899 average SC value GB lists for the production M2 in the 13" Pro, we get 1899 x 3.68/3.49 = 2002, which is within normal GB variation of 2027.
 
Last edited:
New M2 Max geekbench scores:

Single Core - 2027
Multi Core - 14888

More respectable but who knows if they are real or not!
Whoa! Fingers crossed it's true. A 2027 Single Core score for a laptop chip is very respectable. There's a psychological barrier there too, breaking the 2k points mark. 14888 multicore is also ahead of the competition.

This lists a 3.68 frequency (the last "M2 Max" was 3.54; the production M2 in the 13" Pro and Air is 3.49). The variation in freq is consistent with these scores (if legit) being from preproduction devices.

Extrapolating what we'd expect the SC score to be from the clock speed and the 1899 average SC value GB lists for the production M2 in the 13" Pro, we get 1899 x 3.68/3.49 = 2002, which is within normal GB variation of 2027.
Good points here.
 
Whoa! Fingers crossed it's true. A 2027 Single Core score for a laptop chip is very respectable. There's a psychological barrier there too, breaking the 2k points mark. 14888 multicore is also ahead of the competition.


Good points here.

I’ve said before that I think the primary difference between the M1 and M2 p-cores is that M2 is designed to be scalable to a higher clock. If this score is accurate, looks like that’s what‘s going on.
 
I’ve said before that I think the primary difference between the M1 and M2 p-cores is that M2 is designed to be scalable to a higher clock. If this score is accurate, looks like that’s what‘s going on.
Any speculation on what the max clock would be for the M2 P-cores before one runs into the limitations you mentioned earlier (https://talkedabout.com/threads/apple-m1-vs-m2.3135/page-15#post-125980)? And on what the max, within that envelope, Apple might use for its desktop M2 devices?
 
If the benchmark turns out to be true, it'd mean +15.6% Single Core and +20.8% Multicore over the Mac Studio. The multicore score scales better than 8 x P core score should be due to either the 2 extra E-cores or the improvements in the µarch of the E cores on the A15. Maybe both. I'm saying should because the M1 Pro/Max had the E cores running at 2GHz (vs 1GHz on the regular M1) when under high load [source], and now that the M2 Pro/Max apparently has 4 E cores that design decision may have changed. Maybe the M2 Pro/Max E cores only go up to 1GHz, in which case the full difference in scores would be because the µarch improvement in those cores.

I’ve said before that I think the primary difference between the M1 and M2 p-cores is that M2 is designed to be scalable to a higher clock. If this score is accurate, looks like that’s what‘s going on.
If true, hopefully that opens the door to desktop chips having higher clocks too. Although the id of this particular benchmark already looks like a desktop model name (Mac14,6). What kind of changes are needed to make a core scalable to higher frequencies? I assume shortening the critical path(s) is involved as you say, but is anything else required? I think you've also mentioned in the past that the highest clock of a chip also has some variability from chip to chip or from wafer to wafer. What causes that? I'm trying to make sense of how some chips can almost double its base frequency while Apple's have so little headroom.
 
If the benchmark turns out to be true, it'd mean +15.6% Single Core and +20.8% Multicore over the Mac Studio. The multicore score scales better than 8 x P core score should be due to either the 2 extra E-cores or the improvements in the µarch of the E cores on the A15. Maybe both. I'm saying should because the M1 Pro/Max had the E cores running at 2GHz (vs 1GHz on the regular M1) when under high load [source], and now that the M2 Pro/Max apparently has 4 E cores that design decision may have changed. Maybe the M2 Pro/Max E cores only go up to 1GHz, in which case the full difference in scores would be because the µarch improvement in those cores.


If true, hopefully that opens the door to desktop chips having higher clocks too. Although the id of this particular benchmark already looks like a desktop model name (Mac14,6). What kind of changes are needed to make a core scalable to higher frequencies? I assume shortening the critical path(s) is involved as you say, but is anything else required? I think you've also mentioned in the past that the highest clock of a chip also has some variability from chip to chip or from wafer to wafer. What causes that? I'm trying to make sense of how some chips can almost double its base frequency while Apple's have so little headroom.

to answer your first question, shortening the critical paths (while accounting for hold time requirements), designing for the required currents to get the higher clock speed (designing for electromigration, hot carrier effects, etc), and just a lot of physical design and verification nittygritty is required.

As for your second question, the variability comes from variability in each process step. Each mask layer has tolerances. For example, you need to align mask. So in step 1 say you use a mask to determine where photoresist goes. Then you etch. then you deposit metal. then you mask again so you can etch away some of the metal. But the new mask may not be perfectly aligned with where the first mask was. The tolerances are incredibly tight.

You are also doping the semiconductor. It’s impossible to get it exactly the same twice. The wafer has curvature to it (imperceptible to a human eye). So chips at the edges are a little different than chips in the middle. Etc. etc.

the dimensions and number of atoms we are talking about are so small that it’s hard to keep everything identical at all times. Small changes in humidity, slight differences in the chemical composition of etchants or dopants, maybe somebody sneezed in the clean room. So many things can affect the end result. Vertical cross-sections of wires are never the same on two-chips (if you look at them with a powerful-enough microscope). Etc. etc.

In the end, btw, apple’s chips undoubtedly have more headroom than they’ve used, presumably because apple doesn’t feel they need to sacrifice user experience to use it. the higher the frequency, the more heat, the worse battery life, the worse chip reliability, etc. It also makes the bus circuitry more complicated - just because you scale the CPU doesn’t mean other parts of the chip can scale, so you need to account for them being on vastly different clocks, which gets more complicated the wider the spread and if the clocks aren’t integer multiples of each other.

My gut is telling me they simply never bothered to make a scalable chip until now, because, honestly, they didn’t need to.
 
I’ve said before that I think the primary difference between the M1 and M2 p-cores is that M2 is designed to be scalable to a higher clock. If this score is accurate, looks like that’s what‘s going on.
Interesting. I thought previously it might’ve been a deliberate design decision making the clocks the same on all M1 chips, but if this is accurate then indeed it could’ve been simply a limitation of the firestorm core design rectified in avalanche.
 
I’ve said before that I think the primary difference between the M1 and M2 p-cores is that M2 is designed to be scalable to a higher clock. If this score is accurate, looks like that’s what‘s going on.
Yeah, M1 is the first generation. These are core designs shared with iPhone, and the yearly phone release cycle is a big cash cow for Apple, so a conservative approach makes sense. They would not have wanted the Mac projects to add much risk before they were fully committed to the Mac transition, and at kickoff time for the A14/M1 generation of Apple Silicon, they probably did not know yet whether they were fully committed.
 
Back
Top