Skip to content
/ jcc Public

A full C compiler, written in pure C. No 3rd party dependencies or parser generators

License

Notifications You must be signed in to change notification settings

john-h-k/jcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a44e8c0 · Mar 21, 2025
Mar 21, 2025
Mar 21, 2025
Mar 21, 2025
Mar 21, 2025
Mar 21, 2025
Mar 20, 2025
Nov 20, 2024
Dec 1, 2024
Mar 21, 2025
Mar 20, 2025
Nov 6, 2023
Mar 19, 2025
Mar 21, 2025

Repository files navigation

jcc

JCC is designed to be a pure C11 (no dependencies) C11/C18/C23 compiler.

CI Status

OS AArch64 (Arm64) x64 RISC-V (32)
Ubuntu Ubuntu AArch64 Ubuntu x64 Ubuntu RISC-V
macOS macOS AArch64 macOS x64

If tests are failing, ignore it! Development is very active (and pushes sometimes break things)

Aims:

  • To be a complete C11/C18/C23 compiler with full functionality (WIP)
  • To use zero third-party dependencies or helper tools (no parser generators, assemblers, lexers, etc) other than system linker
  • To follow best practices and have sensible compiler architecture
    • Building the "smallest" C compiler is an explicit non-goal
  • To be useful for learning about compilers
    • Uses proper IRs rather than AST -> ASM
    • Generates machine code, not assembly
    • Builds SSA form and puts values in registers rather than spilling everything
    • Builds object files and invokes system linker manually (rather than via a compiler or an assembler)
    • Doesn't use hacks (mostly...)

Is it sound?

No, it is text based

Why?

I just wanted to write a C compiler. It happens to be an easily buildable, easily runnable compiler that is still a grokkable size, It is probably too large to be considered a toy compiler, but the core architecture is much more accessible than the the shoggoth of Clang/GCC.

Support

AArch64, x64, and RISC-V 32 are supported, although some of the x64 ABI is not yet fully implemented and RISC-V 32 64 bit integers are WIP. Working with RISC-V requires installing a RISC-V linker.

Things that don't work yet

  1. va_list and variadic function implementation. Calling them works fine
  2. Atomics
  3. Linking on musl-based distros. This is relatively simple and should work soon

Requirements

For installation

  • C11-compliant C compiler
  • POSIX shell
  • git, curl, or wget for downloading sources
  • Nothing else!

For development

  • C11-compliant C compiler
  • Bash, version >=3
  • CMake
  • A few other tools are used by jcc.sh commands to make for a more pleasant experience, but are not needed. These include bat (for syntax-highlighting), fd, and rg

Installation

To directly install jcc for playing around with (tested on macOS & various Linux distros):

curl -sSL https://jcc.johnk.dev/install.sh | sh

The above URL is just a direct fetch of ./scripts/install.sh which you can verify by visiting it. It is NOT a redirect, it forwards the content itself. If you prefer, you can directly curl the script from raw.githubusercontent.com/john-h-k/jcc/refs/heads/main/scripts/install.sh

wget can also be used, or you can clone the repository and run ./scripts/install.sh if you somehow have git but not curl or wget(???).

To install for development (which is realistically what you should do!):

  • Ensure you have bash and cmake installed
  • Fork & clone the repo (exercise left to reader)
  • Run ./jcc.sh for help

Development

The jcc.sh script can be used for common workflows. A key subset of the commands can be seen here (run ./jcc.sh for all commands):

jcc.sh COMMAND

COMMANDS:
    help        Show help
    run         Build, then run JCC with provided arguments
    debug       Build, then run JCC under LLDB/GDB with provided arguments
    test        Run tests
    test-all    Run tests with all optimisation levels
    format      Format codebase

For the test script, run jcc.sh test help.

Design

  • Arg parsing
    • Declarative style arguments for simplicit. Very macro-heavy
    • Code is args.h and args.c
  • Preprocessor
    • Has two modes
      • Self-contained - when invoked with the -E flag, will run the preprocessor and output the result
      • Streaming - in normal compilation, tokens from the preprocessor are consumed and fed to the lexer
    • Code is preproc.h and preproc.c
  • Frontend - Lexer + Parser
    • These work in lockstep (tokens are provided on-demand by the lexer), and build the AST
    • It is a very loose and untyped AST, to try and parse as many programs as possible, with little verification
    • Lexing code is lex.h and lex.c
      • Lexer takes preproc tokens
    • Parsing code is parse.h and parse.c
  • Semantic analysis - Typecheck
    • Builds a typed AST from the parser output
    • Performs most validation (are types correct, do variables exist, etc)
    • Parsing code is typechk.h and typechk.c
  • Intermediate Representations and passes
    • All code located in the ir folder
    • IR representation structs and helper methods are in ir/ir.h and ir/ir.c
    • Pretty-printing functionality is in ir/prettyprint.h and ir/prettyprint.c
      • This also includes graph-building functionality with graphviz
    • IR building
      • This stage converts the AST into an SSA IR form
      • It assumes the AST is entirely valid and well-typed
      • Code is ir/build.h and ir/build.c
    • Lowering
      • Firstly, global lowering is performed. This lowers certain operations that are lowered on all platforms
        • E.g br.switchs are converted into a series of if-elses, and load.glb/store.glb operations are transformed to addr GLB + load.addr/store.addr
      • This converts the IR into the platform-native form
      • Then, per-target lowering occurs
        • For example, AArch64 has no % instr, so x = a % b is converted to c = a / b; x = a - (c * b)
      • The code for lowering is within the appropriate backend folders
    • Register allocation
      • Simple LSRA, done seperately across floating-point & general-purpose registers
    • Eliminate phi
      • Splits critical edges and inserts moves to preserve semantics of phi ops
  • Code Generation
    • Converts the IR into a list of 1:1 machine code instructions
    • These are all target specific
    • Currently codegen does too much - in the future I would like to move lots of its responsibilities (e.g prologue/epilogue) into IR passes
  • Emitting
    • Actually emits the instructions from code generation into memory
  • Object file building
    • Writes the object file to disk
    • Currently only macOS Mach-O (in macos) and ELF (in linux) are supported
  • Linking
    • Links using the platform linker
    • Effectively just runs the linker as one would from the command line
    • Platform specific link-code in macos and linux

About

A full C compiler, written in pure C. No 3rd party dependencies or parser generators

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages