The Bits Between The Bits XNU-Edition

casperes1996

Site Champ
Joined
Nov 29, 2021
Posts
325
Since there's a great level of combined technical knowledge on these forums, I was hoping you could help me (or we could help each other) understand something that I've long been trying to get a better understanding of.

Some of you may have seen Matt Godbolt, of Compiler Explorer fame, present his excellent talk "The Bits Between the Bits". An overview of everything that happens before we reach main in a C program. In essence, all the work that happens to run a program as simple as
int main () {}
and what makes up its size, since just compiling the above on a modern system will take up more space than you had available in total on a couple decades old machines.

I have a pretty good sense of what happens on Linux and what generally needs to happen (as part of my bachelor project I wrote my own OS), but there are a lot of mystery about how macOS exactly loads up a Mach-O binary.

For one thing, if you inspect a regular Mach-O binary, it lists /usr/lib/libSystem.B.dylib as being the program interpreter, but nothing exists at the specified path.
Second; What is actually the entry point for a binary as far as the kernel is concerned? On Linux you can assemble a binary with _start as the entry point to circumvent loading the C runtime that calls main. On macOS I've found no way to go any earlier in the program loading chain than C's main. I've seen a __dyld_start and dyld_startup around that I feel should relate to program loading, but while lldb finds the procedures when I attempt to set breakpoints, it does not actually stop at any point when I run my program with the breakpoints set.
Perhaps parts of this also ties into the fact that entirely statically linking isn't a thing on modern macOS and all program loads will go through a dynamic program loader.

Really I want to better understand the process from the execve system call is made until main is executed. What confuses me the most right now is the program interpreter listed in the Mach-O that doesn't exist and what the first thing the kernel calls outside of kernel space is
 
I hope this thread goes someplace - I’d really like to understand this as well, and have never had the time to investigate.
 
Thanks, Cmaier. I hope so too. I'll keep looking into things myself too when I have enough spare time available and will update if I discover anything
 
For one thing, if you inspect a regular Mach-O binary, it lists /usr/lib/libSystem.B.dylib as being the program interpreter, but nothing exists at the specified path.

This is because libSystem is being pulled from the dyld cache rather than as a stand-alone library these days. It’s a pretty recent change starting with Big Sur. But the core thing about libSystem is that it is really an umbrella library, exposing other standard libraries out. This includes libc, and I suspect even includes dyld these days, based on what you say here.

Valgrind hit this particular quirk: https://github.com/LouisBrunner/valgrind-macos/issues/21

On Linux you can assemble a binary with _start as the entry point to circumvent loading the C runtime that calls main. On macOS I've found no way to go any earlier in the program loading chain than C's main. I've seen a __dyld_start and dyld_startup around that I feel should relate to program loading, but while lldb finds the procedures when I attempt to set breakpoints, it does not actually stop at any point when I run my program with the breakpoints set.

Unlike Linux, the entry point doesn't exist in the binary, as you've surmised, but dyld. And because Apple has been taking advantage of clang's modules support for a while, how you set a breakpoint to get at "dyld`start" differs from how you do it on Linux, or even macOS of the mid-00s. The symbol these days is called "start" and lives in the "dyld" module. And as you have probably already guessed, dyld is the macOS dynamic linker.

So dyld is the one responsible for the pre-main environment, and isn't something as easily overridden. It handles making sure the needs of the executable are met before executing main in terms of the dynamic linker, and it looks like "dyld`dyld4:: prepare" is ultimately responsible for getting the address of main so it can jump into it. And unfortunately, this is where my knowledge of app boot starts to fall off. This is an area of macOS that is under constant development by Apple, for various reasons. Complex applications can spend a lot of time in "dyld`start", and so there's opportunities for Apple to improve app launch performance. It also means what I knew a decade ago isn't even remotely accurate now.

But at the very least, one advantage of dyld owning app launch is that Apple can make these sorts of changes without requiring developers to recompile their apps.
 

Attachments

  • Screen Shot 2021-12-01 at 8.28.14 AM.png
    Screen Shot 2021-12-01 at 8.28.14 AM.png
    105.2 KB · Views: 64
This is because libSystem is being pulled from the dyld cache rather than as a stand-alone library these days. It’s a pretty recent change starting with Big Sur. But the core thing about libSystem is that it is really an umbrella library, exposing other standard libraries out. This includes libc, and I suspect even includes dyld these days, based on what you say here.
OK but why isn't it in the path? I mean fair enough it's cached but it must still exist somewhere on disk right? It's clearly not in the path the Mach-O thinks it is. Doesn't matter much if macOS knows how to point it to the right memory for it anyway or something, but it has to be somewhere on disk; Do you know where?
Unlike Linux, the entry point doesn't exist in the binary, as you've surmised, but dyld. And because Apple has been taking advantage of clang's modules support for a while, how you set a breakpoint to get at "dyld`start" differs from how you do it on Linux, or even macOS of the mid-00s. The symbol these days is called "start" and lives in the "dyld" module. And as you have probably already guessed, dyld is the macOS dynamic linker.

So dyld is the one responsible for the pre-main environment, and isn't something as easily overridden. It handles making sure the needs of the executable are met before executing main in terms of the dynamic linker, and it looks like "dyld`dyld4:: prepare" is ultimately responsible for getting the address of main so it can jump into it. And unfortunately, this is where my knowledge of app boot starts to fall off. This is an area of macOS that is under constant development by Apple, for various reasons. Complex applications can spend a lot of time in "dyld`start", and so there's opportunities for Apple to improve app launch performance. It also means what I knew a decade ago isn't even remotely accurate now.

But at the very least, one advantage of dyld owning app launch is that Apple can make these sorts of changes without requiring developers to recompile their apps.
Hey, I managed to get a breakpoint in dyld`start. I have been so darn close, trying to break at dyld`_start. I thought for sure there'd be an underscore based on how names are usually mangled/demangled.
PS. You probably know but not all Linux binaries contain the entry point in the binary either. You can make them have it in the binary but they can also use a similar system to what macOS does here where the elf just contains a field that says to use the system libc loader as entry point.

And yeah I've mocked about a fair deal with dyld so have some knowledge of how it works. Though I'm also unsure if it's possible to create a shim library intercepting calls to the real deal, without forcing a flat namespace. I at least couldn't figure that out in my testing, and it seems like it'd be impossible given the two-tier namespace resolution system. But that's totally orthogonal to this thread and just a fun little extra.

I'll be sure to look for dyld4::prepare then.

You've helped loads making my investigations easier, especially giving me what I needed to get a breakpoint at dyld`start :)
thanks
 
OK but why isn't it in the path? I mean fair enough it's cached but it must still exist somewhere on disk right? It's clearly not in the path the Mach-O thinks it is. Doesn't matter much if macOS knows how to point it to the right memory for it anyway or something, but it has to be somewhere on disk; Do you know where?

The dyld cache itself is what is on disk. Apple doesn’t ship the dylibs for these libraries anymore, as they are pre-packaged into the cache that resides on the sealed system partition. So when you ask dyld for the library, the path matches the pre-packaged version and you only ever need the cache.

https://mjtsai.com/blog/2020/06/26/reverse-engineering-macos-11-0/

As the system partition is now sealed, Apple’s starting to do more and more things like this. I think the kernel is getting to the point where they ship a fully linked kernel, bypassing the kext cache.

PS. You probably know but not all Linux binaries contain the entry point in the binary either. You can make them have it in the binary but they can also use a similar system to what macOS does here where the elf just contains a field that says to use the system libc loader as entry point.

It’s been a long time, so I had forgotten. Honestly, my job has moved me further away from this low level stuff in the last few years, although even then it was mostly poking and prodding as required to provide better bug details to Apple when we found ugly spots.
 
As the system partition is now sealed, Apple’s starting to do more and more things like this. I think the kernel is getting to the point where they ship a fully linked kernel, bypassing the kext cache.

Hm. But third party kexts still have official support so I'm not sure that's entirely viable
I also wonder if all of this with a sealed system volume and all is going to make macOS upgrades, even minor point updates, require essentially a full image overwrite instead of delta patching

Oh and thanks for the insight and links. Also reminded me I forgot to read the GitHub issue for Valgrind. Valgrind is like *the* only think I'm missing on macOS relative to other development platforms (i.e. Linux)
 
Hm. But third party kexts still have official support so I'm not sure that's entirely viable

So it does look like they backtracked on prelinked kernels in Big Sur (Catalina would rebuild a prelinked kernel when a kext is added/removed). Explains why my memory is a little foggy there. With the sealed volume it does create conflicts that Catalina didn’t have to deal with. That said, I wouldn’t be surprised to see it come back soon enough. M1 systems already block 3rd party kexts unless you lower your security settings.

I also wonder if all of this with a sealed system volume and all is going to make macOS upgrades, even minor point updates, require essentially a full image overwrite instead of delta patching

The system volume is “sealed” in the sense that an APFS snapshot is made and that snapshot signed so that the OS can detect tampering with the snapshot. The snapshot is what you boot as a read-only volume. So a system update just needs to write the updates to the underlying system volume, create a new snapshot and and sign it. Neat thing about this is that you can roll back the volume to the latest snapshot as part of the update process, which makes it even harder to tamper with the system volume.
 
So it does look like they backtracked on prelinked kernels in Big Sur (Catalina would rebuild a prelinked kernel when a kext is added/removed). Explains why my memory is a little foggy there. With the sealed volume it does create conflicts that Catalina didn’t have to deal with. That said, I wouldn’t be surprised to see it come back soon enough. M1 systems already block 3rd party kexts unless you lower your security settings.
Indeed. I remember a tech talk with Apple though where they said something about not removing the support for third party kexts until they had an alternative in place for all use cases. In the talk I'm thinking of they showed a checklist where they had non-kext solutions with IOKit or other frameworks in place and some red crosses for things that still required kexts to be achieved. Wish I could remember more than that though to reference.
I think last I installed a third party kext was for FUSE to add ext file system support but the fuse package I got for it was flaky anyway
The system volume is “sealed” in the sense that an APFS snapshot is made and that snapshot signed so that the OS can detect tampering with the snapshot. The snapshot is what you boot as a read-only volume. So a system update just needs to write the updates to the underlying system volume, create a new snapshot and and sign it. Neat thing about this is that you can roll back the volume to the latest snapshot as part of the update process, which makes it even harder to tamper with the system volume.
Yeah. I'll be honest, I forgot the logic but I saw someone mention somewhere that what Apple were doing on that front was preventing delta updates to a certain level of granularity, possibly to do with the way signatures were generated or something and then they showed how small OS point updates had grown in size since the read-only system volume was introduced. But I don't know. But overall it's a pretty neat setup. I imagine it's also a key part of the new restore/factory reset feature in macOS; Instead of reinstalling the OS from recovery or an OS install media, you can just start from fresh from System Preferences; Probably just clearing out everything not part of the system volume
 
Yeah. I'll be honest, I forgot the logic but I saw someone mention somewhere that what Apple were doing on that front was preventing delta updates to a certain level of granularity, possibly to do with the way signatures were generated or something and then they showed how small OS point updates had grown in size since the read-only system volume was introduced. But I don't know. But overall it's a pretty neat setup. I imagine it's also a key part of the new restore/factory reset feature in macOS; Instead of reinstalling the OS from recovery or an OS install media, you can just start from fresh from System Preferences; Probably just clearing out everything not part of the system volume

There's nothing about the seal itself that hampers small updates. There's a number of variables that can feed it: https://eclecticlight.co/2021/07/28/why-are-big-sur-updates-so-large/

Yeah, as someone who's watched Apple over the last 20 years or so, it's been interesting to watch their incremental approach. It took many years to get to the Monterrey's "Erase All Content and Settings" feature, step by step.
 
Back
Top