multiple binary diffing up to 100 samples (fn_fuzzy is better for more samples)
- IDA 7.6 and BinDiff 6
- python packages: pefile macholib pyelftools python-idb prettytable
Before using it, you have to edit the paths for executables/scripts in bindiff.py.
# paths (should be edited) g_out_dir = r'Z:\haru\analysis\tics\bindiff_db' g_ida_dir = r'C:\work\tool\IDAx64' g_exp_path = r'Z:\cloud\gd\python\IDAPython\ida_haru\bindiff\bindiff_export.idc' g_differ_path = r"C:\Program Files\BinDiff\bin\bindiff.exe" #g_differ_path = r'C:\Program Files (x86)\zynamics\BinDiff 4.2\bin\differ64.exe' g_save_fname_path = r'Z:\cloud\gd\python\IDAPython\ida_haru\bindiff\save_func_names.py'
You can check the command line options by -h or –help.
Z:\cloud\gd\work\python\IDAPython\bindiff>python bindiff.py -h usage: bindiff.py [-h] [--out_dir OUT_DIR] [--ws_th WS_TH] [--fs_th FS_TH] [--ins_th INS_TH] [--bb_th BB_TH] [--size_th SIZE_TH] [--func_regex FUNC_REGEX] [--debug] [--clear] [--noidb] [--use_pyidb] primary {1,m} ... positional arguments: primary primary binary to compare {1,m} mode: 1, m 1 BinDiff 1 to 1 m BinDiff 1 to many optional arguments: -h, --help show this help message and exit --out_dir OUT_DIR, -o OUT_DIR output directory including .BinExport/.BinDiff (default: Z:\haru\analysis\tics\bindiff_db) --ws_th WS_TH, -w WS_TH whole binary similarity threshold (default: 0.2) --fs_th FS_TH, -f FS_TH function similarity threshold (default: 0.8) --ins_th INS_TH, -i INS_TH instruction threshold (default: 30) --bb_th BB_TH, -b BB_TH basic block threshold (default: 1) --size_th SIZE_TH, -s SIZE_TH file size threshold (MB) (default: 10) --func_regex FUNC_REGEX, -e FUNC_REGEX function name regex to reduce noise (default: sub_|fn_|chg_) --debug, -d print debug output (default: False) --clear, -c clear .BinExport, .BinDiff and function name cache (default: False) --noidb, -n skip a secondary binary without idb (default: False) --use_pyidb use python-idb (default: False)
There are 2 modes. One is “1 to 1” mode, the other is “1 to many” mode.
In “1 to 1” mode, we should specify executable file paths for primary and secondary targets.
Z:\cloud\gd\work\python\IDAPython\bindiff>python bindiff.py Z:\haru\analysis\tics\hoge\[redacted]_worker_fixed 1 Z:\haru\analysis\tics\hoge\samples\checked\[redacted]c2f05 --------------------------------------------- [*] BinDiff result [*] elapsed time = 0.390000104904 sec, number of diffing = 1 [*] primary binary: (([redacted]_worker_fixed)) ============== 1 high similar binaries (>0.2) ================ +----------------+--------------------------------------+ | similarity | secondary binary | +----------------+--------------------------------------+ | 0.211967127395 | [redacted]c2f05 | +----------------+--------------------------------------+ ---------------------------------------------
“high similar binaries” means some binaries are found with whole binary similarities. You can adjust the similarity by -w option.
In “1 to many” mode, we should specify an executable file path for a primary target and a folder path for secondary targets. We can specify to compare secondary binaries recursively (-r option).
Z:\cloud\gd\work\python\IDAPython\bindiff>python bindiff.py Z:\haru\analysis\tics\hoge\samples\attacker\[redacted]_worker_fixed m Z:\haru\analysis\tics\hoge\samples\tmp --------------------------------------------- [*] BinDiff result [*] elapsed time = 6.71900010109 sec, number of diffing = 3 [*] primary binary: (([redacted]_worker_fixed)) ============== 10 high similar functions (>0.8), except high similar binaries ================ +----------------+--------------+--------------------------------+----------------+----------------------------------+-----------------+ | similarity | primary addr | primary name | secondary addr | secondary name |secondary binary | +----------------+--------------+--------------------------------+----------------+----------------------------------+-----------------+ | 1.0 | 0x180067720 | Virt_sub_180067720 | 0x180004c30 | sub_180004c30 | [redacted]e6504 | | 1.0 | 0x1800674b0 | sub_1800674b0 | 0x180004930 | sub_180004930 | [redacted]e6504 | | 1.0 | 0x1800673a0 | chg_peparse_Virt_sub_1800673A0 | 0x180004820 | sub_180004820 | [redacted]e6504 | | 1.0 | 0x1800672b0 | Virt_sub_1800672B0 | 0x180004730 | sub_180004730 | [redacted]e6504 | | 1.0 | 0x18005fd84 | sub_18005fd84 | 0x13f69af94 | sub_13f69af94 | [redacted]fb841 | | 1.0 | 0x18005fd84 | sub_18005fd84 | 0x180012648 | __crtMessageBoxW | [redacted]e6504 | | 1.0 | 0x180050f30 | sub_180050f30 | 0x1800019f0 | ?erase@?$basic_string@DU?$char_t | [redacted]e6504 | | 0.98987073046 | 0x1800677e0 | chg_peparse_Virt_sub_1800677E0 | 0x180004cf0 | sub_180004cf0 | [redacted]e6504 | | 0.963708558784 | 0x180067560 | sub_180067560 | 0x1800049e0 | sub_1800049e0 | [redacted]e6504 | | 0.946399194338 | 0x180018780 | chg_rotate_sub_180018780 | 0x140004360 | sub_140004360 | [redacted]92023 | +----------------+--------------+--------------------------------+----------------+----------------------------------+-----------------+ ---------------------------------------------
“high similar functions” means some functions are found with function similarities though they have lower whole binary similarities than the threshold. You can ajust the similarity by -f option.
The function similarity result is very noisy so library/thunk functions are filtered out by the script. Additionally, we can specify the number of instructions/basic blocks, file size, and so on to reduce the noise.
And by default, the script newly creates idbs for the target binaries if not found. If you want to only compare existing idbs, please specify -n.
- If you can’t get the function similarities correctly, adjust the function similarity threshold (–fs_th), instruction threshold (–ins_th), basic block threshold (–bb_th) and function name filter rule (–func_regex) options. The script excludes the matches of small codes because function similarity results of multiple binaries are noisy.
- BinDiff 5.0 and later contains a bug that we can’t load existing .BinDiff files and import symbols/comments due to missing .BinExport files. I hope it will be fixed someday.
- python-idb doesn’t work for IDA 7.6 IDBs. So by default it’s not used (enable –use_pyidb option if needed).