High performance brainfuck transpiler/interpreter for Lua with FFI support
luajit fast_brainfuck.lua program.bf
luajit fast_brainfuck.lua program.bf output.lua
Tested on Ryzen 3700x | 32gb ram 2400mhz
Lua version\Implementation | fast_brainfuck.lua | brainfuck.lua transpiler | brainfuck.lua transpiler | bf.lua interpreter | bf for ubuntu |
---|---|---|---|---|---|
LuaJIT 2.1.0 | 1.870s | 38.309s | 34.392s | 53.81s | N/A |
Lua 5.4 | 15.242s | 294.12s | 188.39s | very slow | N/A |
Clanc C -Ofast | 0.7s | N/A | N/A | N/A | 5.8s |
LuaJIT had an instruction limit of 2^16 on JMP
because of size of register D
When the main code chunk is greater or equal to 2^16, it scans for chunks of code it can dispatch to a subfunction.
It's only enabled on LuaJIT and turns itself on only when ins count is >= 2^16
Long story short, I convert brainfuck code to Intermediate Representation that can itself be turned into Lua (it's either way turned into lua at the end so it can be JIT compiled by LuaJIT compiler)
Before being turned into Lua, it gets optimized thru multiple optimisation passes, each optimisation pass has it's own dedicated instruction that got added in the instruction (IR) definitions in the code.
Turning for example
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
data[i] = data[i] + 1
into
data[i] = data[i] + 9
while data[i] ~= 0 do
data[i] = data[i] [-/+] 1
end
into
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
i = i + 1
data[i] = 0
into
ffi.fill(data + i, intSize * 9, 0)
pretty much turning
while data[i] ~= 0 do
i = i - 1 (jmp1)
data[i] = data[i] + 3 (inc1)
i = i + 1 (jmp2)
data[i] = data[i] - 1 (inc2)
end
i = i - 1 (jmp1)
into
data[i+jmp1] = (-(data[i]/inc2))*inc1
data[i] = 0
i = i + jmp1
3rd pass requires second as assignation is created by second.
Benchmarked on Ryzen 3700x on WSL this time
No optimization pass (other than folding) : 2.59s
First optimization pass (assignation) : 0.8125s high effectiveness
Second optimization pass (memset) : 0.84375s ❌INEFFECTIVE HERE
Third optimization pass (dynamic assignation folding) : 0.171875s
Benchmarked on Ryzen 3700x still on WSL
No optimization pass (other than folding) : 3.5s
First optimization pass (assignation) : 2.92s low effectiveness
Second optimization pass (memset) : 2.54s ✔️EFFECTIVE HERE
Third optimization pass (dynamic assignation folding) : 2.01s
Left part is 1st pass only, right is all 4.