Skip to content

crumbtoo/sydml

Repository files navigation

SydML

About

SydML is currently planned to be a functional programming language with strict semantics.

Github renders this document very poorly, which conflicts with my style of organisation. See the rendered HTML here.

Notice board

[2024-09-18 Wed] Productivity

I’ve been off my meds since [2024-08-20 Tue], and caffeine lost its kick about a week-and-a-half ago; consequently, my productivity has plummeted.

His colleague Alfréd Rényi said, “a mathematician is a machine for turning coffee into theorems”, and Erdős drank copious quantities; this quotation is often attributed incorrectly to Erdős, but Erdős himself ascribed it to Rényi. After his mother’s death in 1971 he started taking antidepressants and amphetamines, despite the concern of his friends, one of whom (Ron Graham) bet him $500 that he could not stop taking them for a month. Erdős won the bet, but complained that it impacted his performance: “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” After he won the bet, he promptly resumed his use of Ritalin and Benzedrine.

[2024-09-20 Fri] We’re so back!!!

I am medicated again, for the first time in about a month!

Motivation

Demonstrate that I’ve learned from last year’s failures

The entire project collapsed at the last minute, likely as a result of the following points.

Very poor planning.

A lack of thorough planning led to extreme unstability, ultimately leaving me with a broken compiler days before the project was due.

To prevent this, I intend to thoroughly architect the compiler and each pass beforehand, down to the individual signatures of key components. This plan would be sure to account for not only high-level passes, but also

  • where errors are thrown and caught,
  • how line and column information is retained,

Not enough testing.

The volatile codebase did not mesh will with my tests against internal APIs. The few tests that I had could not keep up with the rapidly-changing code, and often didn’t even compile.

To address this, tests will be written against a stable command-line interface, treating the entire compiler as a black box.

  • I plan on following a Golden testing model.
  • For source–to-machine-code compilations, this is a straightforward check that an example input matches an output file, bit-for-bit.
  • Testing other functions, such as type-checking, will require serialisation to (JSON|S-expressions), which is useful feature for debugging in any case.
  • Github CI/CD.

New goals

While last year’s compiler was primarily language-focused, I now intend to focus on compiler techniques.

IRs — ANF and SSA

Progress

HeadlineTime
Total time3d 17:32
Progress3d 17:32
\_ Milestones3d 6:41
\_ 0 Preemptive Planning [6/6]5:29
\_ 1 Research Toys [7/7]1d 18:42
\_ 2 Layer 1 [6/7]1d 0:41
\_ 3 The Empty Module [0/8]5:49
\_ Other10:51
\_ Nix setup7:26
\_ Documentation1:44
\_ GitHub Pages action0:26

Milestones

MilestoneParserElabType-checker->ANFClosure-conv.Hoist->QBE
Empty module
Int literal decl
if 1 then 2 else 3
The identity fn

0 Preemptive Planning [6/6]

MLton-esque overview.

Create a skeleton project.

Milestone table.

Define a CLI.

Write tests for empty files.

  • Current blocker: I’m not sure how to uniformly run tests both against black-box executables, as well as typical unit tests. Ideally, both would be run by cabal test.
  • Potential solution:
      main :: IO ()
    - main = ...
    + main = getArgs >>= main'
    
    + main' :: List String -> IO ()
    + main' xs = do
    +   opts <- execParserPure _ _
    +   ...
        

Define Milestone 2.

1 Research Toys [7/7]

Implement the core algorithms as toys with toothpick scaffolding.
  • After implementing closure-conversion, I think it may be a good idea to initially work uncurried by default, à la SML, and implement the necessary optimisations for curry-heavy code later.
  • The value of statically-typed IRs is not to be undermined.
  • [X] Known issue: There is some problem with closure conversion. Try compiling:
    λ> pretty . upToConvert $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` ((Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 3)
        

The Untyped λ-calculus

  • Augmented with tuples.

Rename

LowerANF

Closure-convert

Hoist

ToQBE

  • As of this commit, the toy compiler can compile its first program:
    λ> :m + *Presydc.SSA
    λ> writePipeline "/tmp/t/t.qbe" $ Lam.IntVal 123
        
    $ nix-shell -p qbe gcc
    [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
    [nix-shell]$ ./t ; echo $?
    123
        
  • As of this commit, it can compile arithmetic primitives and impure print calls:
    λ> :m + *Presydc.SSA
    λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.Prim $ Lam.PrimAdd (Lam.IntVal 2) (Lam.IntVal 3)
        
    $ nix-shell -p qbe gcc
    [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
    [nix-shell]$ ./t
    5
        
  • As of this commit, it can compile trivial conditional expressions:
    λ> writePipeline "/tmp/t/t.qbe" $ Lam.IfThenElse (Lam.IntVal 1) (Lam.IntVal 2) (Lam.IntVal 3)
        
    [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
    [nix-shell]$ ./t ; echo $?
    2
        
  • As of this commit, the factorial program sucessfully compiles and executes. Two hidden bugs were fixed in the process:
    • freeVars did not function as expects on application nodes.
       freeVars (LetApp x f ys m) = toHashSetOf (each . #Var) ys
      +                           <> HS.singleton f
                                 <> (freeVars m & HS.delete x)
              
    • Three typos in the hoist function led to both branches of an if-expression being the unhoisted ‘true’ branch — I have no idea how this happened. :P
       go (IfThenElse c t f) = do
         Hoisted fs js t' <- go t
      -   Hoisted fs' js' t' <- go f
      +   Hoisted fs' js' f' <- go f
         (th,el) <- each mkFresh ("then","else")
      -   let bt = Join th Nothing t
      +   let bt = Join th Nothing t'
      -       bf = Join el Nothing t
      +       bf = Join el Nothing f'
         pure $ Hoisted (fs >< fs') (Seq.fromList [bt,bf] >< js >< js') $
           IfThenElse c (Jump th Nothing) (Jump el Nothing)
              
    -- (λf. (λx. f (λv. x x v)) (λx. f (λv. x x v))) (λ rec n. if n then n * rec (n - 1) else 1) 3
    )
    λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` (Lam.Lam "rec" $ Lam.Lam "n" $ Lam.IfThenElse (Lam.Var "n") (Lam.Prim $ PrimMul (Lam.Var "n") (Lam.Var "rec" `Lam.App` (Lam.Prim $ PrimSub (Lam.Var "n") (Lam.IntVal 1)))) (Lam.IntVal 1)) `Lam.App` Lam.IntVal 3
        
  • As of this commit, it can compile closures and applications thereupon, marking the completion of the ToQBE pass!
    λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ (Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 1 `Lam.App` Lam.IntVal 2
        
    [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
    [nix-shell]$ ./t
    1
        

Demo

2 Layer 1 [6/7]

Round three of planning. Define the core data structures, signatures for functions between them, and tests for those functions.

[#A] Big decisions!

NO Side effects?
Decided against, primarily due to top-level side-effects.
Impure
Pros
  • Rely less on optimisations — use ref types.
  • Easy implementation.
Cons
  • Top-level side effects lead to a lot of pain in implementation and use.
Pure
Pros
  • I have Haskellbrain.
  • More interesting implementation.
Cons
  • Potentially complicated to make effecient.
OKAY ML Modules?
We can postpone this thought.

Core data structures [9/9]

Syd monad
Error effect and ADT
Located
Surface AST
Core AST
ANF AST
KILL Closure-converted AST
KILL Hoisted AST
KILL CodeGen record à la Idris

Pass signatures [6/6]

Parse
Elaborate
TypeCheck
ClosureConvert
Hoist
ToQBE

KILL Debug flags [0/7]

Pushed back. These can be implemented once the ASTs in question are more fleshed out.
KILL -ddump-ast
KILL -ddump-core
KILL -ddump-type-check
KILL -ddump-anf
KILL -ddump-closure-convert
KILL -ddump-hoist
KILL -ddump-qbe

Tests

CI/CD

Record demo

3 The Empty Module [1/8]

Parse

Rename

Type-check

Elaborate

Closure-convert

Hoist

LowerANF

ToQBE

Other

I am not convinced that some of these tasks should count towards “work time;” regardless, I choose to log them.

Nix setup

Documentation

GitHub Pages action

Resources

Reference projects

  • the latter is a rewrite of the former.
  • the leading example of demand-driven compilers.
  • implemented in glorious haskell.
  • implements datalog.
  • targets llvm.
  • demand-driven, using rock.
  • implemented in glorious haskell.
  • an interpreter for the nix language.
  • noted only for its use of recursion schemes.
  • implemented in glorious haskell.

The lovely Hécate’s boreal compiler has been an excelent reference. Many of our design decisions align, which has been invaluably convenient for myself }:3.

  • targets JS.
  • haskell-like.
  • written by a moron. 💔
  • implemented in glorious haskell.
  • haskell-like source language.
  • implemented in glorious haskell.
  • Self-described as a reference implementation.
  • targets LLVM.
  • ANF-based IR.
  • strict.
  • otherwise haskell-like.
  • implemented in glorious haskell.
  • Elegantly supports multiple and third-party backends.
  • Uses ANF.
  • A book about compiling Racket and Python to x86-64 assembly.

Parsing

IRs — SSA, ANF, and CPS

SSA vs ANF: source code, slides.

Appel — Continuation Passing, Closure Passing Style

Describes ANF and join points as used in GHC.

  • talk
  • Formalises join points with typing, operational semantics, and syntax.
  • Describes heavyweight optimisations for join points.
  • The paper’s implementation allows for recursive join points.
Dead link. }:\

This thesis emphasizes a “pay as you go” compilation strategy where simple, monorphic code should not be slower simply because the language supports higher-level code. Also, monomorphic functions should get specialized to machine-type functions. C compilers focus on loops, so functional compilers should focus on recursive functions. Explains lots of optimizations with a λ-calculus IR.

A Correspondence between CPS and SSA

Compiler Architecture

Miscellaneous

Covers a lot of topics that textbooks miss.

Nearly all of their linked resources are included in this document.

  • Will be necessary to build tree-sitter-sydml.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages