Skip to content

Latest commit

 

History

History
201 lines (165 loc) · 9.07 KB

linking.md

File metadata and controls

201 lines (165 loc) · 9.07 KB

Linking

Linker

  • A linker converts object files into executables and shared libraries. Let’s look at what that means.

  • For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++, or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker).

  • A compiler translates this program code, which is human-readable text, into another form of human-readable text known as assembly code.

  • Assembly code is a readable form of the machine language which the computer can execute directly.

  • An assembler is used to turn this assembly code into an object file. For completeness,

  • Some compilers include an assembler internally and produce an object file directly

  • Steps

    • .cpp -> Preprocessor -> Compiler -> Assembler -> Linker -> Loader
    • Compile flag --save-temps should generate all intermediate files
  • After independent compilation of translation units to object file, linkers work to connect them as an executable

  • static libraries (.lib, .a) used in linker, dynamic libraries (.dll, .so) used in the loader

  • CppCon 2017: Michael Spencer “My Little Object File: How Linkers Implement C++”

  • Object files

    • Ranges of unsplittable data (sections)
    • Names that reference those data (symbols)
    • List of modifications to that data ( relocations )
  • Dumptools

    • objdump
    • nm
    • llvm-read obj
    • readelf (ELF)
    • otool (mach O)
    • dumpbin (PECOFF)
  • LLD Linker

    • Atom Model
      • Generic
        • Name
        • Scope
        • Ordinal
      • Target Specific Attribute
    • Driver Model
  • BFD Linker

  • Gold Linker

  • LLD Port

    • ELF (Unix) - Replacing (/usr/bin/ld / -fuse-ld=lld)
    • COFF (Windows)
    • Mach-O (Mac)
  • Linking archive file

    • Linking and extracting from .a file.
  • LLD Benchmark

  • LLD Design

    • Visiting same archive file makes it slower
    • Mutually dependant .a file is harder to resolve.
  • Minimal synthetic benchmark: LD vs gold vs LLVM LLD

  • Address Relocation The compiler just leaves a placeholder, which gets populated in linking stage.

Register operands in 64-bit mode can be any of the following:
64-bit general-purpose registers (RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, or R8-R15)
32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP, or R8D-R15D)
16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, BP, or R8W-R15W)
8-bit general-purpose registers: AL, BL, CL, DL, SIL, DIL, SPL, BPL, and R8L-R15L are available using REX
prefixes; AL, BL, CL, DL, AH, BH, CH, and DH are available without using REX prefixes.
• Segment registers (CS, DS, SS, ES, FS, and GS)
RFLAGS register
• x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand pointer, and instruction
pointer)
• MMX registers (MM0 through MM7)
• XMM registers (XMM0 through XMM15) and the MXCSR register
• Control registers (CR0, CR2, CR3, CR4, and CR8) and system table pointer registers (GDTR, LDTR, IDTR, and
task register)
• Debug registers (DR0, DR1, DR2, DR3, DR6, and DR7)
• MSR registers
RDX: RAX register pair representing a 128-bit operand
  • ELF

  • 2016 EuroLLVM Developers' Meeting: R. Ueyama "New LLD linker for ELF"

    • Basic linking of cat object files and relocation
    • Simulation of linking, undefined, defined, lazy
  • ELF Standard

    • Two system calls from the linux kernel are relevant. The fork system call (or perhaps vfork or clone) is used to create a new process, similar to the calling one (every Linux user-land process except init is created by fork or friends).
    • The execve system call replace the process address space with a fresh one (essentially by sort-of mapping segments from the ELF executable and anonymous segments, then initializing the registers, including the stack pointer).
    • The x86-64 ABI supplement and the Linux assembly howto give details
    • The dynamic linking happens after execve and involves the /lib/x86_64-linux-gnu/ld-2.13.so file, which for ELF is viewed as an "interpreter"
    • The segments contain information needed at runtime, while the sections contain information needed during linking.
    • Section contains static for the linker, segment dynamic data for the OS
    • File Header
    • Section Header
    • Data
    • Magic Number
  • How OS X Executes Applications

    • Library relocation problems, the first thing to do is run on the executable
    • The ldd tool lists the dependent shared libraries that the executable requires, along with their paths if found
  • CppCon 2018: Matt Godbolt “The Bits Between the Bits: How We Get to main()”

    • Procedural Linkage Table
    • Global Object Table
    • static init function for each translation unit
    • Puts a pointer to this function into a section called init array
    • ARM instructions cant jump too long, so use intermediate jump
  • In-depth: ELF - The Extensible & Linkable Format

  • Procedure Linakge Table

    • Program Header Table
    • Section Header Table
    • Sections reside at the bottom and can be stripped
    • static link: gcc -static -fno-pie -no-pie -g -o a.out a.c
    • All problems in computer science can be solved by an additional layer of indirection.
    • Position independent code used for shared libraries by implementing relative addressing
    • Linker does two things
      • Symbol resolution
      • Relocation
    • The .interop section helps with dynamic linker
    • At the linking step only the relocation and symbol table instructions are embedded, the real data is stored at load time.
    • Global Offset Table
    • Procedure Linkage Table
      • Resolves procedure
    • Test out program execution in GDB
      • Running this on a program with two printf

gdb a.out b a.c:4 b a.c:5 r disas 'printf@plt' p/x (void*)0x60101B readelf -hW a.out will keep track of addresses

   - Making common cases fast is at the heart of system design


- [Before Main: How Executables Work on Linux](https://youtu.be/jR2hUhjcAXI)
   - Windows does not have a distinction between sections and segments
   - Segments are composed of sections

- [CppCon 2017: Nir Friedman “What C++ developers should know about globals (and the linker)”](https://www.youtube.com/watch?v=xVT1y0xWgww&ab_channel=CppCon)

```bash
ldd /bin/ls
objdump -x /bin/ls

otool

# See Machine Code
objdump -dC /bin/ls

# See Machine Code with Relocation
objdump --reloc -dC /bin/ls

# See symbols of the object file
objdump --syms -C /bin/ls