Tolk v0.7: overhaul compiler internals and the type system; bool
type
#1477
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Two months have passed since the announcement of Tolk. You might be wondered, what was going on and why there we no releases yet.
Throughout all November, I've been working on the vision of the future. My goal was to "visualize" what Tolk v1.0 should look like. What's the language we're all targeting to, so that it solves lots of practical problems, avoids manual cells/slices manipulation, provides sufficient mechanisms for ABI generation, but still being zero overhead. I have created a giant roadmap (40 PDF pages!) describing the vision, and how, step by step, we're going to reach it.
Throughout all December, I've been constantly working on the compiler's kernel. As you know, Tolk is a fork of FunC. FunC compiler internals are very challenging to be extended and modified. The way FunC looks like is just a mirror of its internal implementation. Heading towards the future, I had to partially "untangle" this "legacy FunC core", so that in the future, it will be able to "interbreed" with features it was not originally designed for.
Currently I am done with this preparation. Tolk v0.7 contains a fully rewritten semantic analysis kernel (though almost invisible to the end user, huh).
Notable changes in Tolk v0.7
fun f<T>(...)
and instantiations likef<int>(...)
bool
typevalue as T
The documentation and IDE plugins have been updated accordingly (see related pull requests below).
Now, let's cover every bullet in detail.
Refactor and revamp compiler internals
As of Tolk v0.6, I've managed to implement parsing source files to AST (completely missed in FunC), which gave me a control over syntax. That's why changes in Tolk v0.6 were almost syntactical only. AST, after being parsed, was transformed to a "legacy core", forcing all the rest FunC "forked core" to work.
Heading towards the future the AST should be converted directly to IR (intermediate representation), performing all semantic analysis on the AST level.
At the AST level, it is necessary to handle: lvalue/rvalue semantics; mutability analysis; unreachable code detection; symbol resolving; type inference and checks; various other validity checks.
This step is primarily about creating the foundational for semantic analysis, laying the groundwork for future enhancements.
Implementation details:
const
variables are now calculated NOT based on CodeBlob, but via a newly-introduced AST-based constant evaluatorRewrite the type system from Hindley-Milner to static typing
You know, that FunC is "functional C". But do you know, what makes it "functional"? Not the fact that FunC is very close to TVM. Not its peculiar syntax. And even not the
~
tilda. "Functional" is mostly about Hindley-Milner type system, that had no conceptual changes in earlier Tolk, but is fully replaced now.Hindley-Milner type system is a common approach for functional languages, where types are inferred from usage through unification. As a result, type declarations are not necessary:
For example,
For example,
In the FunC codebase,
te_Indirect
is about this, along withforall
, which comes with its own nuances.While this approach works for now, problems arise with the introduction of new types like
bool
, where!x
must handle bothint
andbool
. It will also become incompatible withint32
and other strict integers.Example. When nullable types are introduced, we want
null
not be assignabled toint
. However, with unification, the following would be valid:Instead of an error, Hindley-Milner would perform unification and accept it. This will clash with structure methods, struggle with proper generics, and become entirely impractical for union types (despite claims that it was "designed for union types").
A fun fact: this is not noticeable now. Because the current type system is very limited. But as soon as we add bool, fixed-width integers, nullability, structures, and generics, these problems will become significant.
The goal is to have predictable, explicit, and positionally-checked static typing. While Hindley-Milner is powerful, it's actually "type inference for the poor" — simple to implement when there's no time to fundamentally design the language.
Static typing (similar to TypeScript without any or Rust) is a must-have, even though implementing it is quite complex. Key aspects include:
var i = 0
is int (not "unify int" as now);var c = null
is forbidden, usevar c: int? = null
;var a = b
is okay since the type of b is known at that pointauto
types; function parameters must be strictly typedte_Indirect
, each node's type will be directly inferred during analysisforall
types, generic functions need to be resolved differently, since types are known during node analysis; saving a generic function into a variable is deniedfun f<T>(a: T) { var b: [T] = [a]; }
null
and how it interacts with assignmentst.tupleAt(0)
(it's a generic method where T doesn't depend on arguments) should have "external hint" propagated, see below about genericsIdeally, type inference should rely on a control flow graph, which we currently lack. It will be implemented later. For now, the existing AST representation will suffice.
Implementation details:
forall
completely removed, generic functions introduced (they work like template functions actually, instantiated while inferring)<...>
syntax, example:t.tupleAt<int>(0)
as
keyword, for examplet.tupleAt(0) as int
Clear and readable error messages on type mismatch
In FunC, due to Hindley-Milner, type mismatch errors are very hard to understand:
After full reconsideration of the type system, they became human-readable:
Generic functions
fun f<T>(...)
and instantiations likef<int>(...)
In FunC, there were "forall" functions:
In Tolk v0.6, the syntax changed to remind mainstream languages:
But the change was only about the syntax. Under the hood, it was transformed to exactly the same representation, since
forall
was a part of the type system.To replace Hindley-Milner type system, I had to implement support for generic functions. When
f<T>
is called,T
is detected (in most cases) by provided arguments:The syntax
f<int>(...)
is also supported:User-defined functions may also be generic:
Having called
replaceLast<int>
andreplaceList<slice>
will result in TWO generated asm (fift) functions. Actually, they mostly remind "template" functions. At each unique invocation, function's body is fully cloned under a new name.There may be multiple generic parameters:
A generic parameter
T
may be something complex.Or even functions, it also works:
Note, that while generic
T
are mostly detected from arguments, there are not so obvious corner cases, whenT
does not depend from arguments:To make this valid,
T
should be provided externally:Also note, that
T
for asm functions must occupy 1 stack slot (otherwise, asm body is unable to handle it properly), whereas for a user-defined function,T
could be of any shape.In the future, when structures and generic structures are implemented, all the power of generic functions will come into play. Implementing them now was a necessary step of getting rid of Hindley-Milner.
bool
type, castingboolVar as int
With controlled type checking operating directly on the AST, it became be possible to introduce a proper
bool
type. Under the hood,bool
is still -1 and 0 at TVM level, but from the type system's perspective,bool
andint
are now different.Comparison operators
== / >= /...
returnbool
. Logical operators&& ||
returnbool
. Constantstrue
andfalse
have thebool
type. Lots of stdlib functions now returnbool
, notint
(having -1 and 0 at runtime):Operator
!x
supports bothint
andbool
. Condition ofif
and similar accepts bothint
(!= 0) andbool
. Logical&&
and||
accept bothbool
andint
, preserving compatibility with constructs likea && b
wherea
andb
are integers (!= 0).Arithmetic operators are restricted to integers, only bitwise and logical allowed for bools:
This is a breaking change since in many real-world contracts, values previously treated as integers will now be booleans, and invalid operations on them will result in compilation errors.
The compiler does some optimizations for booleans. Example:
boolVar == true
->boolVar
. Example:!!boolVar
->boolVar
. Example:!x
forint
results in asm0 EQINT
, but!x
forbool
results in asmNOT
.Note, that logical operators
&& ||
(missed in FunC) use IF/ELSE asm representation always. In the future, for optimization, they could be automatically replaced by& |
when it's safe (example:a > 0 && a < 10
). To manually optimize gas consumption, you can still use& |
(allowed for bools), but remember, that they are not short-circuit.Assigning
bool
toint
is prohibited to avoid unintentional errors:If you really it,
bool
can be cast toint
viaas
operator:There are no runtime transformations.
bool
is guaranteed to be -1/0 at TVM level, so this is type-only casting. But generally, if you need such a cast, probably you're doing something wrong (unless you're doing a tricky bitwise optimization).Related pull requests
What's coming next?
I spent lots of time on creating the detailed Roadmap and preparing the compiler's kernel for future language changes. Finally, we'll reach structures with auto packing to/from cells.
There will be several publicly available releases while heading this way, mostly dedicated to type system enrichment and stack management. The next will be available quite soon, stay tuned.