Skip to content

Commit

Permalink
Merge pull request #1 from bontchev/bontchev-vba7
Browse files Browse the repository at this point in the history
- Implemented VBA7 support.
- Implemented support for documents created by the 64-bit version of Office.
- The opcodes are now stored more efficiently.
- Various bugfixes and optimizations.
  • Loading branch information
bontchev authored Oct 9, 2016
2 parents d492faa + adb65ab commit 62b52cc
Show file tree
Hide file tree
Showing 2 changed files with 327 additions and 524 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ It is not widely known, but macros written in VBA (Visual Basic for Applications

- _Source code_. The original source code of the macro module is compressed and stored at the end of the module stream. This makes it relatively easy to locate and extract and most free DFIR tools for macro analysis like [oledump](https://blog.didierstevens.com/programs/oledump-py/) or [olevba](http://www.decalage.info/python/olevba) or even many professional anti-virus tools look only at this form. However, most of the time the source code is completely ignored by Office. In fact, it is possible to remove the source code (and therefore make all these tools think that there are no macros present), yet the macros will still execute without any problems. I have created a [proof of concept](http://bontchev.my.contact.bg/poc2b.doc) document illustrating this. Most tools will not see any macros in it but if opened with Word version 2000 or higher, it will display a message and will launch `calc.exe`. It is surprising that malware authors are not using this trick more widely.

- _P-code_. As each VBA line is entered into the VBA editor, it is immediately compiled into p-code (a pseudo code for a stack machine) and stored in a different place in the module stream. The p-code is precisely what is executed most of the time. In fact, even when you open the source of a macro module in the VBA editor, what is displayed is not the decompressed source code but the p-code decompiled into source. Only if the document is opened under a version of Office that uses a different VBA version from the one that has been used to create the document, the stored compressed source code is re-compiled into p-code and then that p-code is executed. This makes it possible to open a VBA-containing document on any version of Office that suppots VBA and have the macros inside remain executable, despite the fact that the different versions of VBA use different (incompatible) p-code instructions.
- _P-code_. As each VBA line is entered into the VBA editor, it is immediately compiled into p-code (a pseudo code for a stack machine) and stored in a different place in the module stream. The p-code is precisely what is executed most of the time. In fact, even when you open the source of a macro module in the VBA editor, what is displayed is not the decompressed source code but the p-code decompiled into source. Only if the document is opened under a version of Office that uses a different VBA version from the one that has been used to create the document, the stored compressed source code is re-compiled into p-code and then that p-code is executed. This makes it possible to open a VBA-containing document on any version of Office that supports VBA and have the macros inside remain executable, despite the fact that the different versions of VBA use different (incompatible) p-code instructions.

- _Execodes_. When the p-code has been executed at least once, a further tokenized form of it is stored elsewhere in the document (in streams, the names of which begin with `__SRP_`, followed by a number). From there is can be executed much faster. However, the format of the execodes is extremely complex and is specific for the particular Office version (not VBA version) in which they have been created. This makes them extremely non-portable. In addition, their presence is not necessary - they can be removed and the macros will run just fine (from the p-code).

Expand All @@ -20,9 +20,9 @@ The script should work both in Python 2.6+ and 3.x, although I've been using it

## Usage

The script takes as a command-line argument a list of one or more names of files or directories. If the name is an OLE2 document, it will be inspected for VBA code and the p-code of each code module will be disassembled. If the name is a directory, all the files in this directory and its subdirectories will be similarly processed. In addition to the disassembled p-code, by default the script also displays the contents of the `PROJECT` stream (which is ASCII text), the parsed records of the `dir` stream, as well as the identifiers (variable and function names) used in the VBA modules and stored in the `_VBA_PROJECT` stream.
The script takes as a command-line argument a list of one or more names of files or directories. If the name is an OLE2 document, it will be inspected for VBA code and the p-code of each code module will be disassembled. If the name is a directory, all the files in this directory and its subdirectories will be similarly processed. In addition to the disassembled p-code, by default the script also displays the parsed records of the `dir` stream, as well as the identifiers (variable and function names) used in the VBA modules and stored in the `_VBA_PROJECT` stream.

The script supports VBA5 (Office 97, MacOffice 98) and VBA6 (Office 2000 and higher).
The script supports VBA5 (Office 97, MacOffice 98), VBA6 (Office 2000 to Office 2009) and VBA7 (Office 2010 and higher).

The script also accepts the following command-line options:

Expand All @@ -32,7 +32,7 @@ The script also accepts the following command-line options:

`-n`, `--norecurse` If a name specified on the command line is a directory, process only the files in this directory; do not process the files in its subdirectories.

`-d`, `--disasmonly` Only the p-code will be disassembled, without the parsed contents of the `dir` stream, the contents of the `PROJECT` stream, or the identifiers in the `_VBA_PROJECT` stream.
`-d`, `--disasmonly` Only the p-code will be disassembled, without the parsed contents of the `dir` stream or the identifiers in the `_VBA_PROJECT` stream.

`--verbose` The contents of the `dir` and `_VBA_PROJECT` streams is dumped in hex and ASCII form. In addition, the raw bytes of each compiled into p-code VBA line is also dumped in hex and ASCII.

Expand Down Expand Up @@ -73,10 +73,10 @@ For reference, it is the result of compiling the following VBA code:

- While the script should support documents created by MacOffice, this has not been tested (and you know how well untested code usually works). This should be tested and any bugs related to it should be fixed.

- The 64-bit versions of Office use yet another VBA version - VBA7. It uses different p-code opcodes and the current version of the script will not be able to disassemble them correctly. I know how to do it but I need documents with macros created by such a version of Office for testing.

- I am not an experienced Python programmer and the code is ugly. Especially the humongous opcode tables make me want to barf every time I look at them. Somebody more familiar with Python than me should probably rewrite the script and make it look better.
- I am not an experienced Python programmer and the code is ugly. Somebody more familiar with Python than me should probably rewrite the script and make it look better.

## Change log

Version 1.00: Initial version.
Version 1.0.0: Initial version.

Version 1.1.0: Storing the opcodes in a more efficient manner. Implemented VBA7 support. Implemented support for documents created by the 64-bit version of Office.
Loading

0 comments on commit 62b52cc

Please sign in to comment.