casperes1996
Site Champ
- Joined
- Nov 29, 2021
- Posts
- 325
Since there's a great level of combined technical knowledge on these forums, I was hoping you could help me (or we could help each other) understand something that I've long been trying to get a better understanding of.
Some of you may have seen Matt Godbolt, of Compiler Explorer fame, present his excellent talk "The Bits Between the Bits". An overview of everything that happens before we reach main in a C program. In essence, all the work that happens to run a program as simple as
int main () {}
and what makes up its size, since just compiling the above on a modern system will take up more space than you had available in total on a couple decades old machines.
I have a pretty good sense of what happens on Linux and what generally needs to happen (as part of my bachelor project I wrote my own OS), but there are a lot of mystery about how macOS exactly loads up a Mach-O binary.
For one thing, if you inspect a regular Mach-O binary, it lists /usr/lib/libSystem.B.dylib as being the program interpreter, but nothing exists at the specified path.
Second; What is actually the entry point for a binary as far as the kernel is concerned? On Linux you can assemble a binary with _start as the entry point to circumvent loading the C runtime that calls main. On macOS I've found no way to go any earlier in the program loading chain than C's main. I've seen a __dyld_start and dyld_startup around that I feel should relate to program loading, but while lldb finds the procedures when I attempt to set breakpoints, it does not actually stop at any point when I run my program with the breakpoints set.
Perhaps parts of this also ties into the fact that entirely statically linking isn't a thing on modern macOS and all program loads will go through a dynamic program loader.
Really I want to better understand the process from the execve system call is made until main is executed. What confuses me the most right now is the program interpreter listed in the Mach-O that doesn't exist and what the first thing the kernel calls outside of kernel space is
Some of you may have seen Matt Godbolt, of Compiler Explorer fame, present his excellent talk "The Bits Between the Bits". An overview of everything that happens before we reach main in a C program. In essence, all the work that happens to run a program as simple as
int main () {}
and what makes up its size, since just compiling the above on a modern system will take up more space than you had available in total on a couple decades old machines.
I have a pretty good sense of what happens on Linux and what generally needs to happen (as part of my bachelor project I wrote my own OS), but there are a lot of mystery about how macOS exactly loads up a Mach-O binary.
For one thing, if you inspect a regular Mach-O binary, it lists /usr/lib/libSystem.B.dylib as being the program interpreter, but nothing exists at the specified path.
Second; What is actually the entry point for a binary as far as the kernel is concerned? On Linux you can assemble a binary with _start as the entry point to circumvent loading the C runtime that calls main. On macOS I've found no way to go any earlier in the program loading chain than C's main. I've seen a __dyld_start and dyld_startup around that I feel should relate to program loading, but while lldb finds the procedures when I attempt to set breakpoints, it does not actually stop at any point when I run my program with the breakpoints set.
Perhaps parts of this also ties into the fact that entirely statically linking isn't a thing on modern macOS and all program loads will go through a dynamic program loader.
Really I want to better understand the process from the execve system call is made until main is executed. What confuses me the most right now is the program interpreter listed in the Mach-O that doesn't exist and what the first thing the kernel calls outside of kernel space is