-
Static Linking
- File Size Increase
- No update as the lib source code gets embedded
-
Dynamic Linking
- File Size Small
- Update as only address goes into binary
-
macOS
- static:
.a
- dynamic:
.dylib
- static:
-
Linux
- static:
.so
- dynamic:
.dylib
- static:
-
Windows
- static:
.lib
- dynamic:
.dll
- static:
-
A linker converts object files into executables and shared libraries. Let’s look at what that means.
-
For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++, or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker).
-
A compiler translates this program code, which is human-readable text, into another form of human-readable text known as assembly code.
-
Assembly code is a readable form of the machine language which the computer can execute directly.
-
An assembler is used to turn this assembly code into an object file. For completeness,
-
Some compilers include an assembler internally and produce an object file directly
-
Steps
.cpp -> Preprocessor -> Compiler -> Assembler -> Linker -> Loader
- Compile flag
--save-temps
should generate all intermediate files
-
After independent compilation of translation units to object file, linkers work to connect them as an executable
-
static libraries (
.lib
,.a
) used in linker, dynamic libraries (.dll, .so) used in the loader -
CppCon 2017: Michael Spencer “My Little Object File: How Linkers Implement C++”
-
Object files
- Ranges of unsplittable data (sections)
- Names that reference those data (symbols)
- List of modifications to that data ( relocations )
-
Dumptools
- objdump
- nm
- llvm-read obj
- readelf (ELF)
- otool (mach O)
- dumpbin (PECOFF)
-
- Atom Model
- Generic
- Name
- Scope
- Ordinal
- Target Specific Attribute
- Generic
- Driver Model
- Atom Model
-
BFD Linker
-
Gold Linker
-
LLD Port
- ELF (Unix) - Replacing (
/usr/bin/ld
/-fuse-ld=lld
) - COFF (Windows)
- Mach-O (Mac)
- ELF (Unix) - Replacing (
-
- Linking and extracting from
.a
file.
- Linking and extracting from
-
- Visiting same archive file makes it slower
- Mutually dependant
.a
file is harder to resolve.
-
Address Relocation The compiler just leaves a placeholder, which gets populated in linking stage.
Register operands in 64-bit mode can be any of the following:
• 64-bit general-purpose registers (RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, or R8-R15)
• 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP, or R8D-R15D)
• 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, BP, or R8W-R15W)
• 8-bit general-purpose registers: AL, BL, CL, DL, SIL, DIL, SPL, BPL, and R8L-R15L are available using REX
prefixes; AL, BL, CL, DL, AH, BH, CH, and DH are available without using REX prefixes.
• Segment registers (CS, DS, SS, ES, FS, and GS)
• RFLAGS register
• x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand pointer, and instruction
pointer)
• MMX registers (MM0 through MM7)
• XMM registers (XMM0 through XMM15) and the MXCSR register
• Control registers (CR0, CR2, CR3, CR4, and CR8) and system table pointer registers (GDTR, LDTR, IDTR, and
task register)
• Debug registers (DR0, DR1, DR2, DR3, DR6, and DR7)
• MSR registers
• RDX: RAX register pair representing a 128-bit operand
-
2016 EuroLLVM Developers' Meeting: R. Ueyama "New LLD linker for ELF"
- Basic linking of cat object files and relocation
- Simulation of linking, undefined, defined, lazy
-
- Two system calls from the linux kernel are relevant. The fork system call (or perhaps vfork or clone) is used to create a new process, similar to the calling one (every Linux user-land process except init is created by fork or friends).
- The execve system call replace the process address space with a fresh one (essentially by sort-of mapping segments from the ELF executable and anonymous segments, then initializing the registers, including the stack pointer).
- The x86-64 ABI supplement and the Linux assembly howto give details
- The dynamic linking happens after execve and involves the /lib/x86_64-linux-gnu/ld-2.13.so file, which for ELF is viewed as an "interpreter"
The segments contain information needed at runtime, while the sections contain information needed during linking.
- Section contains static for the linker, segment dynamic data for the OS
- File Header
- Section Header
- Data
- Magic Number
-
How OS X Executes Applications
- Library relocation problems, the first thing to do is run on the executable
- The ldd tool lists the dependent shared libraries that the executable requires, along with their paths if found
-
CppCon 2018: Matt Godbolt “The Bits Between the Bits: How We Get to main()”
- Procedural Linkage Table
- Global Object Table
- static init function for each translation unit
- Puts a pointer to this function into a section called init array
- ARM instructions cant jump too long, so use intermediate jump
-
In-depth: ELF - The Extensible & Linkable Format
- Segments are runtime, sections are link time-specific
- Segment(Program Header e_phoff) contains multiple sections(Section Header e_shoff), also how to load that in memory
- Sections are only used during linking, and used by debuggers
sstrip
can be used to strip all sections, yet the program runs- BSS means uninitialized data segment, initialized variables got to the data segment
- PIC, ASLR
- The ELF specification: https://refspecs.linuxfoundation.org/elf/elf.pdf
- elf.h from the Linux kernel: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h
- How programs get run: https://lwn.net/Articles/631631
- TLS: https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter8-5/index.html
- Relocation: https://refspecs.linuxbase.org/elf/gabi4+/ch4.reloc.html
- sstrip: https://github.com/BR903/ELFkickers
-
- Program Header Table
- Section Header Table
- Sections reside at the bottom and can be stripped
- static link:
gcc -static -fno-pie -no-pie -g -o a.out a.c
- All problems in computer science can be solved by an additional layer of indirection.
- Position independent code used for shared libraries by implementing relative addressing
- Linker does two things
- Symbol resolution
- Relocation
- The
.interop
section helps with dynamic linker - At the linking step only the relocation and symbol table instructions are embedded, the real data is stored at load time.
- Global Offset Table
- Procedure Linkage Table
- Resolves procedure
- Test out program execution in GDB
- Running this on a program with two
printf
-
- Running this on a program with two
gdb a.out b a.c:4 b a.c:5 r disas 'printf@plt' p/x (void*)0x60101B readelf -hW a.out will keep track of addresses
- Making common cases fast is at the heart of system design
- [Before Main: How Executables Work on Linux](https://youtu.be/jR2hUhjcAXI)
- Windows does not have a distinction between sections and segments
- Segments are composed of sections
- [CppCon 2017: Nir Friedman “What C++ developers should know about globals (and the linker)”](https://www.youtube.com/watch?v=xVT1y0xWgww&ab_channel=CppCon)
```bash
ldd /bin/ls
objdump -x /bin/ls
otool
# See Machine Code
objdump -dC /bin/ls
# See Machine Code with Relocation
objdump --reloc -dC /bin/ls
# See symbols of the object file
objdump --syms -C /bin/ls