The three key things this module provides are:
- InstructionSet
- a bundle of Opcode defintions
- Optree
- a tree of Operations (see below)
- SyntaxTree
- an abstract syntax tree
The two key concepts to understand are:
- Operation
- generic operation types (Unary, Binary, etc.)
- compose into a tree
- associated with an Opcode
- Opcode
- specific operations (
add
,pre_inc
, etc.) - static information such as flags & "prototype" for the opcode
- specific operations (
We can use Allium
to essentially de-compile the opcode tree in perl
.
perl foo.pl
B
optreeAllium
optreeAllium
AST
Now we have both an optree and an AST, now we can the following with them.
We should have sufficient information to generate source code in most any language we like (even back to Perl), while retaining semantics.
Allium
AST- Perl source code
- Java source code
- etc.
Given the Optree we can re-compile it and target a different low level representation (IR), or ocpode set. Possibly taking advantage of tooling (JITs, Optimizers, etc.) to greatly improve speed & resource usage.
Allium
Optree- LLVM IR
- JVM Opcodes
- etc.
Perl does a lot of optimizations at compile time, but is limited because of the flexibility allowed at runtime. If we can verify that this flexibility is not used, then it could be possible to optimize Perl opcodes more agressively at compile-time.
Parsers such as Guacamole
can target the Allium
AST which can be used to
do the following.
Just as ASTs can be de-compiled from Optrees, we can also compile an Optree
from an AST. This is what all compilers do, and would be exactly what perl
does if you gave it the same code.
However, if the parser were to add information such as types, etc. to the AST,
these could then be used to optimise the Optree. And since this is targeting
the Allium
Optree, once in that format, you can use all the tools available
for that and the AST (see above).
Syntax extensions would be just custom AST nodes, which could be transformed into standard AST nodes. In fact, this is the foundation for a macro system similar to Scheme or Lisp.
We've already seen what is capable with this. It can be used to create an AST,
or re-compiled to JVM opcodes. But using tools like B::Generate
we can also
re-target the perl
runtime, which would enable the following:
It would be pretty silly to write a Javascript compiler that targets perl
as
a runtime and just ignore the 20 years of JS engine optimizations. But this
could open up possibilities for DSLs or other "small" languages to be run
within a Perl program and be both controllable and composable by that same
Perl program.
NOTE: Some crazy ideas can fall out of this one if you are not careful ;)
This is a very rough list of the current set of capabilities, much of which is very unrefined.
- Extract opcode information about a specific Perl checkout
- turn it into an Allium::InstructionSet
- contains data about:
- name, description, flags
- valid Operation types
- valid Private flags
- the "prototype" of the opcode
- etc.
- contains data about:
- dump/load the instruction set as JSON
- turn it into an Allium::InstructionSet
-
Extract optree from
perl
usingB
- turn it into an Allium::Optree
- which is a tree of Allium::Operations
- uses the Allium::InstructionSet to enrich the data from
B
- accounting for nullified ops, etc.
- dump/load the optree as JSON
- retaining all the connections, flags, etc.
- turn it into an Allium::Optree
-
Load Allium::Optree into
perl
usingB::Generate
- Round trip from
perl
toB
toAllium
toB
finally back toperl
- This should allow us to use an
Allium
optree as a compiler target- and open up the possibility for a new
Perl
parser (see more below)
- and open up the possibility for a new
- Round trip from
- An Allium::Optree can be used to create an Allium::SyntaxTree
- Visitor interface can be used to traverse the tree
- dump/load the syntax tree as JSON
These are the next steps of development for this module, in no particular order. The functionality described here should open up many possibilities for the future.
- Constructing custom instruction sets
- can be used to limit the set of allowed opcodes, etc.
- Constructing instruction sets for different versions of Perl
- allowing comparison and change tracking, etc.
- Given an Allium::SyntaxTree, build an Allium::Optree from it
- This allows parser to directly target the SyntaxTree
- making it simpler for a new
Perl
parser to be written
- making it simpler for a new
- This allows parser to directly target the SyntaxTree
These are the features planned for this module, but have yet to be written. In
most cases the proof of concept is already written (in B::MOP
) and now it
needs to be "ported" to use Allium.
Modules like Moose
, etc. provide Meta Object Protocols to introspect and
manipulate the package/class system of Perl at runtime. These tools are very
powerful and can be used to manipulate the subroutines in packages. But these
tools are limited to only being able to manipulate subroutines, and are not
able to introspect the code of the subroutine.
This module will provide the ability to introspect and manipulate the code that is inside of subroutines at compile time, as well as all the other features of a runtime MOP (manipulating package namespaces).
Using the information contained in the Optree and the InstructionSet we can
determine the types for a reasonable number of SyntaxTree nodes already. From
here we can attempt to infer the remaining types of the program. This type
system will never be like Haskell
or Rust
, but instead something much
less rigorous and therefore more appropriate for Perl
.
The MOP (described above) could be used to resolve subroutines & methods at
compile-time for type checking. We could additionally create some kind of type
description file for Perl modules, similar to how TypeScript uses .d.ts
files,
so that signatures do not need to be recompiled every time.
Ususally these are done at the source code level, which may be good enough, but this could do it at the opcode level instead. Not terribly sure this is useful.
The perl
debugger kinda sucks, using this toolset it is possible to make a
very full featured debugger.
Since you have access to all the "compiled" stages of the code (AST, Optree, etc.) and they can all be serialized, you can inspect it all without having to run any of it. This allows these things to be done at a much larger scale since they essentially become offline text processing (with JSON). And using things like Hash Consing and Merkle Trees (blame Yuval) this could even be done at a very large scale.
- finding duplicated code that has been slightly modified
- can be done by matching AST nodes and ignoring variable and function names
- code complexity counts
- this is a simple traversal of the AST and some counters
- dependency tracing
- it would be possible to build the whole graph actually