-
Notifications
You must be signed in to change notification settings - Fork 2
Linking
An important feature is linking, because some programs need to be able to reuse existing code from other programs. In fact, linking support comprises the entirety of Phase 2. Implementing it is nontrivial, however, because it requires being able to merge multiple programs without breaking any code containing references to offsets (such as jumps). This page aims to detail thoughts and possible approaches.
A program includes a library by including that library's path in its #includes section. For example, if the main program is in main.wasm and depends on libraries in math.wlib and memory.wlib, the main program might look like this:
...
#includes
"math.wlib"
"memory.wlib"
...
In the future, there may be a standard library, which may be included with entries like <math> instead of "math.wlib".
A library has two parts: compiled, unlinked code and metadata. The metadata is a map of labels and the address each label points to. The compiled, unlinked code is produced by wasmc using the --library option.
A library produced under the first approach differs from normal code in that no absolute addresses are used in the instructions; instead, offsets are used, with appropriate linker flags set. For example, a Store Immediate instruction to write a register's value to the third value in the #data section will have 010 as the linker flag and 0x2 for the immediate value, representing the address 0x2 positions after the start of the #data section.
An alternative approach is to ignore the flags and use absolute addresses. During compilation of main program under this approach, library code is de-linked, its address references being replaced with offsets in a similar manner to Approach 1. (The difference here is that the offsets are deduced after the libraries are compiled, instead of during library compilation.) One potential pitfall is that the offset calculation may occur when undesired or fail to occur when it is desired. This issue would be less likely to occur under Approach 1 because under Approach 1, the offset calculator has access to the library's full initial parse tree, instead of just to the compiled code, so it knows which immediate values are numeric values and which are label references.
The third possible approach is a hybrid of the other two, with the insights available under Approach 1 and the post-compilation offset calculation of Approach 2. Library compilation is similar under Approach 3 and Approach 1 in that the flag bits are used to store useful information, with the difference that absolute addresses are used instead of offsets. However, as we use Approach 2's post-compilation offset calculation and its strategy of comparing addresses to the start addresses of the program sections to detect the type of offset, we don't need to store that information in the flag bits. Instead, we prevent some of the potential issues with Approach 2 by using the flags to tell the linker about the immediate values. For example, a flag value of 000 could signify "don't touch this address," whereas a flag value of 001 could signify "recalculate this instruction's immediate value because it's a label reference."
- Libraries are compiled by
wasmcwith the-loption, enabling flag setting. - To link a program with libraries, the libraries are loaded, their
#metaand#handlerssections are discarded and the rest is concatenated into two arrays. - Expand pseudoinstructions in the main code, finalizing the starting addresses of all the final output's sections.
- In each statement of each library, check I/J-type instructions for a flag.
- If no flag is set, do nothing to that instruction.
- Otherwise, check which section the immediate value would normally be in.
- If the immediate value originally pointed to the library's data section, add to it the starting address of the library's data section in the final output.
- If the immediate value originally pointed to the library's code section, add to it the starting address of the library's code section in the final output.
- Read each library's label metadata and adjust the values in the same manner as the previous two steps.
- Expand labels in the main code.
- Finally, compile all the instructions.
To produce the final output, a program will be merged with its libraries. This involves discarding all the #meta and #handlers sections of the libraries; only the metadata of the main program matters, and each program can have only one set of handlers (if multiple sets of handlers were allowed, there would be no way for the VM to know which to use). All the #data sections are concatenated into one large section used in the final program. The same can't be done with the #code sections yet; the code in all the libraries can be concatenated, but they can't be merged with the main program's code quite yet because the main code is only partway through its compilation. As soon as the number of statements in the main program's code is finalized (i.e., between expansion and delabeling), all the libraries' label positions will have to be readjusted to compensate for the label drift caused by the concatenation. This isn't possible before we know exactly how many statements are in the main code, because all the library code exists later in memory.