This is my writeup for Fools2023, a CTF created by TheZZAZZGlitch.
This repo is an unorganised mess of code, notes, and dumps.
This was primarily a collaboration with penteract (aka tesseract); and I also exchanged a few ideas with Buurazu.
Thanks to penteract for all the help with solving these challenges, TheZZAZZGlitch for creating this, and the GCRI discord.
There are 3 servers available. Server 1, on port 13337, is the most useful for now - it provides several a console with several commands for reading and writing memory, executing code, and reading files. Server 2 (on port 13338) requires a username and password, and server 3 just contains some information.
The primary challenges in the CTF are based around reverse engineering the unknown architecture, the GLVM, that these servers are running on; which server 3 tells us is Z80 based. My prior experience with Z80 based architectures comes from the GameBoy, which previous Fools events by TheZZAZZGlitch have been built around.
However, before needing to do any sort of reverse engineering, two straightforward flags are obtainable from just the resources available. Firstly, using the console allows us to read the files on server 1, one of which is TODO.TXT
, which contains a 50-point flag and some helpful information. Secondly, by reading the memory, and processing the dump a bit so that ascii text can be seen, a flag can be seen nearby some text saying "Wow! Undocumented monitor command!". Printing this text through the console yields a flag that doesn't work, due to it depending on the value of certain memory locations. But scanning through the dump for the names of other commands, the two-letter commands rf
and ls
can be seen, also nearby the one-letter commands; and also nearby UC
- which turns out to indeed be the undocumented command that outputs the correct flag, worth another 50 points.
Quickly, repeatedly typing commands to the console to read, write, and execute memory became tedious. So I started creating some Python code to interface with the server and automate these commands, and some common combinations of them (like writing some code and then immediately executing it - by far my most commonly used utility function). From that point, I could do the majority of the work in my ipython
terminal; alongside ad-hoc code to process data.
Now, to begin reverse-engineering, I had three sample programs to reference - the two files MATHTEST.PRG
and REPORT03.PRG
found from reading the files available to server 1, and the console itself. Upon hitting a breakpoint (opcode 00
), the console would print the values of the four general registers R0-R3, the program counter at the point the breakpoint was hit, the stack pointer, and the contents of the stack. The two program files each had a clear goal to crack - REPORT03.PRG
was decrypting something, and wanted me to figure out how in order to break the encryption; and MATHTEST.PRG
wanted me to understand enough of the architecture to write my own SQRT function.
The first thing I did was to run each opcode in turn, and see where the breakpoint ends up at; this would tell me how many bytes worth of arguments each opcode takes; as well as which ones caused an illegal instruction to occur - either because they were illegal themselves, or they caused code execution to jump somewhere that contained illegal instructions. I then recorded all of this information in a colour-coded spreadsheet, to more easily visualise patterns in similar opcodes near each other.
Then I started studying the programs I had access to, figuring out from context what certain opcodes might mean, and confirming them through tests. For example, there were some function names and addresses listed in TODO.TXT
, so when I saw those addresses I suspected they were preceded by a call
instruction. Addresses that pointed to code were probably jump instructions; and the two program files each had checks to see whether they were executing at address $2000
, so I expected a cmp
instruction with that argument and a conditional jump. Several instructions also come in groups that worked on each combination of registers it affected; e.g. once I'd found an immediate register load instruction (ld R2 $XXXX
) then I would test the adjacent opcodes to confirm that they were ld R1 $XXXX
etc.
Once I had a few opcodes known, I started to build a disassembler; which made it easier to see the structure of the programs and decode more opcodes.
Once I knew enough opcodes to understand how REPORT03.PRG
worked, I could start cracking the decryption. It takes the password you enter, hashes it into a 16-bit value I called X, then the rest of the decryption depends only on the value of X. I re-implemented everything that it was doing in Python, and brute-forced for the value of X that decrypts to something containing a flag (i.e. the string FOOLS2023_
), and got a result. It doesn't actually produce the password that generates this value of X, but it's enough to set up the registers manually and jump to the right point in the program to have it print out the decrypted message.
It contains a flag worth 100 points; as well as some text, which included a hint that the information server (server 3) contains a buffer overflow - which is something useful for later.
The MATHTEST.PRG
program wants me to create my own sqrt program.
At first I wasn't sure that this program had enough code to be capable of verifying that what I'd written was correctly implementing sqrt, but I eventually found that it was using opcode 06
, which I had determined was a sort of syscall instruction, as it did different things depending on it's 1-byte argument. For example, 06 01
would write a byte of output, 06 02
would read a byte of input (which was tricky to figure out; as it would seemingly hang, but then eat the next input that was intended to be part of another command, and cause the server to be in an unexpected state to my interface), 06 03
would halt. This program was using 06 64
, 06 65
, and 06 66
; which I figured were working together to generate and verify test cases for sqrt.
I didn't know that many opcodes at this point, including basic things like register-to-register comparisons and ld R1 [R2]
-type instructions; however penteract pointed out that I was able to work around it using self-modifying code. I created the sqrt program, which used a pre-computed table of square numbers (as I didn't yet know the mul
instruction and it was easier than writing a loop to compute multiplications), and the aforementioned self modifying code tricks to work around instructions that I didn't know.
While I was working on this, penteract also worked on an assembler, which was able to assemble my program into machine code and run it, which worked and caused MATHTEST.PRG
to output a flag worth 150 points. This assembler was also very useful for further opcode research; as it made it easier to write setup code to set up registers and memory values before running unknown opcodes.
The MATHTEST.PRG
program also contains an email address, [email protected]
. The username ax.arwen
is a valid username for server 2. However, a password is still needed.
Hidden file
While disassembling the monitor and learning how each part works, we found that the ls
and rf
commands both use the 06 04
instruction, which loads a file block into memory. Block 0 contains a listing of all the available files on the server, and other blocks contain the files themselves.
However, penteract noticed that when running it manually, block 0 actually contains a listing for a file ASDF.TXT
, which doesn't show up in the ls
output, as its block number is set to 0. Additionally, there's a gap in the used block numbers; going 07
, 08
, and then 0A
. So he tried manually reading block 09
, and it contained a deleted file with a flag worth 150 points.
I also then write some code to dump all possible blocks to a file, but didn't find anything else beyond that.
The next place to look was server 3, the infoserver. The decrypted report had hinted that it has a buffer overflow (and also implemented "stack cookies").
The first thing I tried was putting in a very long username - it claims that the max is 15 characters. I found that it truncates it to 15, and then when you try to select one of the options, it halts - saying "stack smashing detected - 0x{...} != 0x{...}" - where the two hex constants are 4 bytes long, random, and consistently differ by the exactly one byte - the second constant is the first with the first bytes set to 00
. This seemed interesting, and it also happened with a username of exactly 15 characters - it seems like it was reading into a buffer exactly 15 bytes long, then adding a null terminator; overflowing it by one byte. However, this overflow is a red herring - it doesn't actually lead to any useful exploits, besides hint that stack smashing is a thing that the server is checking for.
Then I had the idea to include the special control characters F0
/F1
in the username - I'd determined by studying the monitor code (and how the breakpoint part of it - displaying the values of the registers - is implemented) that these bytes in a string cause the next 2 bytes to be read as a memory location, and then the hexadecimal value of the memory at that location is written in the displayed string - 2 bytes for F0
, or 1 byte for F1
. Initially I thought this might help overflow a buffer when printing the name by including more bytes of output than were present in the input; but then quickly realised that this trick actually allows me to read memory from server 3. So I wrote some code to automate this, reading 8 bytes at a time; and used this to dump the entire server's memory.
This yielded a flag in memory worth 150 points, whose name suggested that this was a format string vulnerability - a common real-world vulnerability. The flag actually wasn't accepted at first, but this was just a bug in the site; which was later patched to work.
This also allowed me to disassemble the code running on the server, so I could look for further vulnerabilities. After dissassembling it, I found that there was a read input function that takes R3 as an input for the maximum number of bytes to read. However, R3 isn't actually set properly before it is called - and after running one command, R3 has a large value. So, this allows for a buffer overflow on the input that's read for a command. I was able to confirm this by entering a lot of A
s, which caused the stack smashing check to come up with a value 0x41414141
, indicating that I'd overwritten the stack canary.
Now all I needed to do was bypass the stack smashing check, which the read input function does before returning. Four random bytes are generated at startup, stored in two locations, and the read input function compares them and halts if they're not equal. But because I was able to read memory upon connecting, and I knew the memory location of the canary from the disassembly, I was able to write some code that reads this value, and crafts an input that will preserve this value, write an arbitrary payload (subject only to the requirememnt that it cannot contain any $0A
(newline) bytes), and overwrite the stack return address to point to this payload. Then the payload I write was one that reads bytes from the input stream, writes them to $F000
, and jumps to it - which I then used to write the console from server 1 onto server 3 and obtain arbitrary code execution.
Finally, the ls
command from the console revealed a file FLAG.TXT
on this server, containing a flag worth 187 points. I also tried dumping all the file blocks for this server, but there was nothing else of interest.
One aspect of the VM that we didn't fully know about was how memory region 0000-0FFF
worked. Reading from it from the console seemed to just yield $58
(ascii X
) - and reading from it with instructions always gave $00FF
. However, there was in fact executable code there - since call instructions to the functions listed in TODO.TXT
do function as advertised. These functions also fail (and return doing nothing) when asked to process any address within this region (e.g. memcpy from it, print a string containing an F0
sequence pointing to it) - which came up during server 3 dumping, causing it to fail when reading from these addresses. And in fact this is where the X
s come from when reading from the monitor console - it actually just memcpys into a buffer that by default contains X
s. If something else was read first then you see that instead.
So, we speculated that code inside protected memory is capable of reading protected memory; and furthermore that the functions there are really implemented via code (rather than, say, the VM treating calls to certain addresses specially). Penteract tried jumping to successive locations in protected memory, and eventually found what appeared to be part of the implementation for PrintStr
. He then found that a few bytes later is a point in the implementation that is after the check for whether we're attempting to read from protected memory; and jumping to that point successfully reads and prints one byte. So he wrote some code to use this to dump the whole protected memory, one byte at a time.
This yielded a flag worth 150 points; and we were also able to disassemble the code found there to see how everything works. Each function was indeed implemented in code, and did have a check before every read or write step that just returns early if it's acting on protected memory; which was bypassable by jumping to the code just after the check. Jumping to the appropriate point in MemCpy
then became an easier way to read memory; as well as test whether the protected region was writable - which it wasn't; so it was correct to refer to it as ROM.
Now all that was left was server 2. The only lead I had was that ax.arwen
was a valid username; and I had tried various strings found in text files as passwords. Buurazu had tried extending their decryption program to generate valid passwords for the report decryption and trie those, but didn't get anywhere. I was hoping that solving server 3 or finding other secrets might reveal some hints; but the only thing I wad to work with was a line in TODO.TXT
saying "upgrade slower processor on grlts02". That combined with the fact that server 2 is quite slow at checking passwords lead me to believe that this was a hint to try a timing attack - i.e. expecting it to take slightly longer the more correct characters of the password I have, and measuring that to crack the password.
The issue was that the results are noisy - things like network latency make it difficult to get consistent results in measuring which password was slightly slower. I tried writing the section that does the timing in rust to minimise any lag caused by python being slow, but it didn't really help. This variance made this challenge very difficult; as I was never fully sure whether the results I had were accurate.
Since I was measuring the response time from when I send the newline that enters the password to the time it accepts or rejects it, latency can only ever cause me to overestimate the true timing, never underestimate it - so that meant that taking the minimum over several runs does improve accuracy. I was also collecting some timings for other parts (like confirming the username, which is expected to be constant), in case I wanted to analyse some other methods; but I didn't end up using that data. At 3 runs per character I still wasn't getting the most consistent results, but Buurazu suggested 10 so I tried that; and I did get convincing evidence that 's' was slower than the others and thus the correct first character - the time between it and the next best character was more than twice the range of the rest of the data. This convinced me that I was at least on the right track to be trying a timing attack.
However, 10 runs each of every possible character would take over 40 minutes to crack each character, which was too slow for my liking. Also, the server went down for a few hours, (after my next run had just finished, where the next character was either 'e' or 'Y' but it wasn't decisively clear which) so I wasn't able to test better ideas or even just let this one run. But I did have the idea to parallelise password attempts - which as my actual connection timings were written in rust, was straightforward to do via the rayon
crate.
When the server was back up we tried again, and penteract had a key insight that we could avoid retrying all the possible passwords each iteration, and instead ignore those that were already observed to be fast - in particular, passwords with times in the fastest 20% of times aren't retried.
This made the process significantly faster; and we were able to crack the password in a reasonable amount of time. Here is the python part of the code for the timing attack. Once we had access to server 2, we were given a monitor console just like in server 1, so were able to read the files on the server, one of which was a flag with 150 points.
After dumping an disassembling server 2's console, we worked out how it validates passwords - the password is stored in an encoded form, and to validate a password, for each character, there are 4 tables (that come from a file ENCTABLE.BIN
on the server) which are read from at that index, and the results are summed and compared to the stored password. Then each table is rotated; which explains why this is slow enough to make the timing attack work. There was a password for a second username, sbw.shadow
, in memory; so penteract wrote some code to reverse this process; and we found that the second password is a flag worth 100 points. There are multiple possible results due to the way this encoding process works, but I entered the one that was entirely ascii and looked the most like some words, and it worked.
Finally, the last thing to do was solve the MIXTEST.PRG
program found on server 2. It mentions "Mix/unmix opcodes", and asks for a password... and then runs into an illegal opcode. Text in the program claimed that the correct password was a flag, however. In fact it turns out that the original version of the challenge actually didn't work, and was patched later, but we didn't know that at first.
Looking at the disassembly of the program, it was normal up until it hits an 07
, which I'd previously said was illegal. Then there were a bunch more code that didn't make sense including illegal opcodes; then at some point it became code that made sense again. Looking at the program, there also seemed to be memory adresses, such as 214C
, the location of the password the user enters. So we guessed that what was going on was that the 07
opcode (which I called 'mix') was mixing the instruction set, so that the instructions each byte refered to was different. Penteract managed to find that running F8
after 07
allowed code execution to continue as normal (so it was an 'unmix' instruction), and then putting other bytes in between had the effects of other known instructions.
I wrote an improved breakpoint handler to print more information such as the memory pointed to by the registers, and even the conditional flag state, to make opcode research much easier. I also wrote a function to try every opcode after a mix to find the one that doesn't make it crash; i.e. the unmixer; expected to be dealing with multiple mixing modes.
However, the mixtest program still didn't make sense. What looked like they should have been two arg instructions were not; and then it unconditionally runs into an $71
instruction, which is unconditionally illegal - it didn;t seem to really be another mix instruction. Buurazu also got stuck on the same point, and determined that the server might be bugged, and contacted TheZZAZZGlitch about it.
In fact, it was indeed bugged. When it was fixed, it was late at night for me,but I was able to use my tools to determine what the new unmixer was ($13
) and decode some of the program; which was actually making sense now (what looked like 2 arg opcodes actually were this time); also it didn't just run into an illegal opcode, and instead terminates with output on whether the password is correct or not. But I ran into some difficulty so decided to stop there and get some sleep.
The next day, Buurazu had beaten me to the top of the leaderboard. But I was able to finish decoding the rest of the program - it was indeed using multiple levels of mixing, which one unmix instruction undoes all of. The program also had a repeating structure that made things easier to decode (check some of the password, if correct then incrememnt R3. The end of the program then says the password is correct if R3=4). Once I'd decoded the full program, figuring out what the correct password was was straightforward (each section simply directly compared some of the bytes of the password, sometimes after a reversible operation like xor). And was finally able to claim the last flag for 150 points, and the total 1337 points available throughout the event, and second place on the leaderboard.
I'd solved all the flags, but there were still a few aspects of the challenge I'd not fully understood. I'd managed to map large chunks of the multiple mixed instruction sets back to the base set, but I don't know how many there are, nor whether there's any kind of pattern in the different instruction sets - i.e. it's not just a matter of composing the same transformation multiple times; there did seem to be a unique table for each one. There were a few opcodes I'd not mapped - I wasn't sure whether the base form of the unmix
instruction was 08
(which was used as a nop somewhere - but is also right next to 07
for mix
) or 0C
- which also seemed to act like nop
. I'm not sure what 0D
was doing; when I tested it then it seemed to be always illegal, but I'd previously marked it in my spreadsheet as legal; and Buurazu speculated that it had something to do with the mixing. Also C4
I'd marked as illegal, but when testing it again it seemed to either be illegal or hang, depending on its argument. All these questions will probably be answered by the source code release. There were also some numbers at the bottom of TODO.TXT
that I didn't know the meaning of; they didn't seem to be memory addresses with anything interesting, nor passwords. There was also a random line of text saying "far red close blue", but I realised that this was a reference to the Doppler effect.
Once again, huge thanks to TheZZAZZGlitch for making this; this was a lot of fun; definitely looking forward to next year's event.