Use PDB files in rewriting process #74

avncharlie · 2024-03-12T19:00:49Z

I'm not sure where the best place to put this is between gtirb-pprinter, gtirb-rewriting and here, so please let me know and I can reopen this in the best repo.

While developing instrumentation using gtirb-rewriting, I would like to do this:

Use ddisasm on a PE and its PDB symbol file to generate a GTIRB IR file with symbol info
Instrument this IR with gtirb-rewriting (maybe even by using symbol info, e.g only instrument a function with a specific name)
Use gtirb-pprinter to output a PE binary and corresponding PDB symbol file from the instrumented IR

I haven't found any way to do this (run instrumentation that preserves symbol information in the output PE binary), is this possible?

aeflores · 2024-03-13T14:29:13Z

Hi @avncharlie , this is an interesting idea!

Our tooling is missing a few pieces for this to be possible. You could run ddisasm on a PE binary and create a gtirb, but we don't have any utilities to parse PDBs and use their information. This could be done (1) as a post-processing step where you annotate the gtirb with information from the PDB, or (2) have ddisasm parse the PDB so it can use it for better disassembly. Option 1 would probably be simpler to implement, but ddisasm would not benefit from the PDB information. Option 2 would probably require more work but could potentially get you better results.

Once you have a gtirb annotated with symbols, I think you should be able to use gtirb-rewriting to instrument it and gtirb-pprinter to generate a new PE. However, gtirb-pprinter cannot generate PDB files, that would be the second missing piece.
I am not sure how much effort this would be. I know llvm's support for PDB files (e.g. https://llvm.org/docs/CommandGuide/llvm-pdbutil.html) has been getting better, so using some of that might make things easier.

XVilka · 2024-03-13T15:37:39Z

You could use the Rizin library for parsing both PDB and DWARF (and maybe some other debugging information in the future):

It is a C library and definitely smaller than LLVM, so using it is much easier.

aeflores · 2024-08-06T00:41:14Z

It looks like the latest version of LIEF https://lief.re/doc/stable/changelog.html#july-23th-2024 has some support for parsing PDBs and DWARF sections. Once we update Ddisasm to the latest Lief, using that information during disassembly should be much easier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PDB files in rewriting process #74

Use PDB files in rewriting process #74

avncharlie commented Mar 12, 2024

aeflores commented Mar 13, 2024

XVilka commented Mar 13, 2024

aeflores commented Aug 6, 2024

Use PDB files in rewriting process #74

Use PDB files in rewriting process #74

Comments

avncharlie commented Mar 12, 2024

aeflores commented Mar 13, 2024

XVilka commented Mar 13, 2024

aeflores commented Aug 6, 2024