SydML is currently planned to be a functional programming language with strict semantics.
Github renders this document very poorly, which conflicts with my style of organisation. See the rendered HTML here.
I’ve been off my meds since [2024-08-20 Tue], and caffeine lost its kick about a week-and-a-half ago; consequently, my productivity has plummeted.
His colleague Alfréd Rényi said, “a mathematician is a machine for turning coffee into theorems”, and Erdős drank copious quantities; this quotation is often attributed incorrectly to Erdős, but Erdős himself ascribed it to Rényi. After his mother’s death in 1971 he started taking antidepressants and amphetamines, despite the concern of his friends, one of whom (Ron Graham) bet him $500 that he could not stop taking them for a month. Erdős won the bet, but complained that it impacted his performance: “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” After he won the bet, he promptly resumed his use of Ritalin and Benzedrine.
I am medicated again, for the first time in about a month!
The entire project collapsed at the last minute, likely as a result of the following points.
A lack of thorough planning led to extreme unstability, ultimately leaving me with a broken compiler days before the project was due.
To prevent this, I intend to thoroughly architect the compiler and each pass beforehand, down to the individual signatures of key components. This plan would be sure to account for not only high-level passes, but also
- where errors are thrown and caught,
- how line and column information is retained,
The volatile codebase did not mesh will with my tests against internal APIs. The few tests that I had could not keep up with the rapidly-changing code, and often didn’t even compile.
To address this, tests will be written against a stable command-line interface, treating the entire compiler as a black box.
- I plan on following a Golden testing model.
- For source–to-machine-code compilations, this is a straightforward check that an example input matches an output file, bit-for-bit.
- Testing other functions, such as type-checking, will require serialisation to (JSON|S-expressions), which is useful feature for debugging in any case.
- Github CI/CD.
While last year’s compiler was primarily language-focused, I now intend to focus on compiler techniques.
Headline | Time | ||
---|---|---|---|
Total time | 3d 17:32 | ||
Progress | 3d 17:32 | ||
\_ Milestones | 3d 6:41 | ||
\_ 0 Preemptive Planning [6/6] | 5:29 | ||
\_ 1 Research Toys [7/7] | 1d 18:42 | ||
\_ 2 Layer 1 [6/7] | 1d 0:41 | ||
\_ 3 The Empty Module [0/8] | 5:49 | ||
\_ Other | 10:51 | ||
\_ Nix setup | 7:26 | ||
\_ Documentation | 1:44 | ||
\_ GitHub Pages action | 0:26 |
Milestone | Parser | Elab | Type-checker | ->ANF | Closure-conv. | Hoist | ->QBE |
---|---|---|---|---|---|---|---|
Empty module | |||||||
Int literal decl | |||||||
if 1 then 2 else 3 | |||||||
The identity fn |
- Current blocker: I’m not sure how to uniformly run tests both against
black-box executables, as well as typical unit tests. Ideally, both would be
run by
cabal test
. - Potential solution:
main :: IO () - main = ... + main = getArgs >>= main' + main' :: List String -> IO () + main' xs = do + opts <- execParserPure _ _ + ...
- After implementing closure-conversion, I think it may be a good idea to initially work uncurried by default, à la SML, and implement the necessary optimisations for curry-heavy code later.
- The value of statically-typed IRs is not to be undermined.
- [X] Known issue: There is some problem with closure conversion. Try compiling:
λ> pretty . upToConvert $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` ((Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 3)
- Augmented with tuples.
- As of this commit, the toy compiler can compile its first program:
λ> :m + *Presydc.SSA λ> writePipeline "/tmp/t/t.qbe" $ Lam.IntVal 123
$ nix-shell -p qbe gcc [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t ; echo $? 123
- As of this commit, it can compile arithmetic primitives and impure
print
calls:λ> :m + *Presydc.SSA λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.Prim $ Lam.PrimAdd (Lam.IntVal 2) (Lam.IntVal 3)
$ nix-shell -p qbe gcc [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t 5
- As of this commit, it can compile trivial conditional expressions:
λ> writePipeline "/tmp/t/t.qbe" $ Lam.IfThenElse (Lam.IntVal 1) (Lam.IntVal 2) (Lam.IntVal 3)
[nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t ; echo $? 2
- As of this commit, the factorial program sucessfully compiles and executes.
Two hidden bugs were fixed in the process:
freeVars
did not function as expects on application nodes.freeVars (LetApp x f ys m) = toHashSetOf (each . #Var) ys + <> HS.singleton f <> (freeVars m & HS.delete x)
- Three typos in the
hoist
function led to both branches of an if-expression being the unhoisted ‘true’ branch — I have no idea how this happened. :Pgo (IfThenElse c t f) = do Hoisted fs js t' <- go t - Hoisted fs' js' t' <- go f + Hoisted fs' js' f' <- go f (th,el) <- each mkFresh ("then","else") - let bt = Join th Nothing t + let bt = Join th Nothing t' - bf = Join el Nothing t + bf = Join el Nothing f' pure $ Hoisted (fs >< fs') (Seq.fromList [bt,bf] >< js >< js') $ IfThenElse c (Jump th Nothing) (Jump el Nothing)
-- (λf. (λx. f (λv. x x v)) (λx. f (λv. x x v))) (λ rec n. if n then n * rec (n - 1) else 1) 3 ) λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` (Lam.Lam "rec" $ Lam.Lam "n" $ Lam.IfThenElse (Lam.Var "n") (Lam.Prim $ PrimMul (Lam.Var "n") (Lam.Var "rec" `Lam.App` (Lam.Prim $ PrimSub (Lam.Var "n") (Lam.IntVal 1)))) (Lam.IntVal 1)) `Lam.App` Lam.IntVal 3
- As of this commit, it can compile closures and applications thereupon,
marking the completion of the ToQBE pass!
λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ (Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 1 `Lam.App` Lam.IntVal 2
[nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t 1
- Maybe using godbolt?
- Rely less on optimisations — use
ref
types. - Easy implementation.
- Top-level side effects lead to a lot of pain in implementation and use.
- I have Haskellbrain.
- More interesting implementation.
- Potentially complicated to make effecient.
KILL CodeGen record à la Idris
- w/ omnix
- the latter is a rewrite of the former.
- the leading example of demand-driven compilers.
- implemented in glorious haskell.
- implements datalog.
- targets llvm.
- demand-driven, using rock.
- implemented in glorious haskell.
- an interpreter for the nix language.
- noted only for its use of recursion schemes.
- implemented in glorious haskell.
The lovely Hécate’s boreal
compiler has been an excelent reference. Many of our
design decisions align, which has been invaluably convenient for myself }:3.
- parses with tree-sitter.
- ANF-based IR.
- demand-driven.
- golden testing.
- haskell-like source language.
- client-server.
- targets lua.
- implemented in glorious haskell.
- targets JS.
- haskell-like.
- written by a moron. 💔
- implemented in glorious haskell.
- haskell-like source language.
- implemented in glorious haskell.
- many different IRs.
- whole-program optimising.
- pretty high-level decent documentation.
- implemented in SML.
- Self-described as a reference implementation.
- targets LLVM.
- ANF-based IR.
- strict.
- otherwise haskell-like.
- implemented in glorious haskell.
- Elegantly supports multiple and third-party backends.
- Uses ANF.
- A book about compiling Racket and Python to x86-64 assembly.
SSA vs ANF: source code, slides.
[-] [#A] Compiling without continuations
Describes ANF and join points as used in GHC.
- talk
- Formalises join points with typing, operational semantics, and syntax.
- Describes heavyweight optimisations for join points.
- The paper’s implementation allows for recursive join points.
KILL The CPS/ANF saga
Tarditi — Design and Implementation of Code Optimiziations for a Type-Directed Compiler for Standard ML (ANF)
This thesis emphasizes a “pay as you go” compilation strategy where simple, monorphic code should not be slower simply because the language supports higher-level code. Also, monomorphic functions should get specialized to machine-type functions. C compilers focus on loops, so functional compilers should focus on recursive functions. Explains lots of optimizations with a λ-calculus IR.
[-] [#A] Lessons from Writing a Compiler
Nearly all of their linked resources are included in this document.
- Will be necessary to build tree-sitter-sydml.