-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit for dissect.executable.pe #10
base: main
Are you sure you want to change the base?
Conversation
First version for the dissect PE format parser. Currently the parser supports standard PE files, as well as some patching of executables and building an executable from scratch. It is expected there will be some issues that will be found during usage and it will need quite some refactoring. The API as it currently exists might need some revising too :)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10 +/- ##
==========================================
+ Coverage 0.00% 88.60% +88.60%
==========================================
Files 5 17 +12
Lines 306 1439 +1133
==========================================
+ Hits 0 1275 +1275
+ Misses 306 164 -142
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done an initial review of the code, mostly just commented on some typehint weirdness and the like. I'll go deeper into it at a later time.
self.patched_pe = BytesIO() | ||
self.functions = [] | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being a property/attribute doesn't make much sense. I wouldn't expect assigning the attribute to some other variable would build the whole patched binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just a design preference, happy to discuss other options for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on what you expect of a property, most properies just return a value of some internal state. This goes a bit beyond that IMHO, where it produces the whole binary. Feels like it should be the responsibility of a method instead.
self.raw_resources.append( | ||
{ | ||
"offset": entry.OffsetToDirectory, | ||
"entry": data_entry, | ||
"data": data, | ||
"data_offset": raw_offset, | ||
"resource": rsrc, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having an additional raw_resources
list, why not add these offset and information to Resource
itself? Where you can have the same functionality as a dict
if you add def raw()
to Resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kept for patching resources, adding them as a dict
like proposed would need some more extensive refactoring of the code. We might opt for that indeed as it would be cleaner
for rsrc_entry in sorted(self.pe.raw_resources, key=lambda rsrc: rsrc["data_offset"]): | ||
entry_offset = rsrc_entry["offset"] | ||
entry = rsrc_entry["entry"] | ||
|
||
if entry._type.name == "IMAGE_RESOURCE_DATA_ENTRY": | ||
rsrc_obj = rsrc_entry["resource"] | ||
data_offset = rsrc_entry["data_offset"] | ||
|
||
# Normally the data is separated by a null byte, increment the new offset by 1 | ||
new_data_offset = prev_offset + prev_size | ||
# if new_data_offset and (new_data_offset > data_offset or new_data_offset < data_offset): | ||
if new_data_offset and new_data_offset != data_offset: | ||
data_offset = new_data_offset | ||
rsrc_entry["data_offset"] = data_offset | ||
rsrc_obj.offset = self.section.virtual_address + data_offset | ||
|
||
data = rsrc_obj.data | ||
|
||
# Write the resource entry data into the section | ||
section_data.seek(data_offset) | ||
section_data.write(data) | ||
|
||
# Take note of the offset and size so we can update any of these values after changing the data within | ||
# the resource | ||
prev_offset = data_offset | ||
prev_size = rsrc_obj.size | ||
|
||
# Write the resource entry into the section | ||
section_data.seek(entry_offset) | ||
section_data.write(entry.dumps()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels out of place in the Resource
class. I wouldn't imagine that everything gets immediately changed when assigning data to a resource. I would expect either the manager to take care of that (which holds the raw resources) or a seperate utility that does so during patching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to move those things around if it doesn't make sense in the place where it is at right now!
from dissect.executable.pe.pe import PE | ||
|
||
|
||
class PESection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a random thought, in line with the other managers and reduce a bit of the complexity of this class.
Is a SectionManager an idea that does the patching and keeping account of the patched and unpatched sections?
aka, something like:
class PESectionManager(Manager):
def __init__(self, pe, section, ...):
self.patched_items = {}
def patch(self, name, data):
# get data from self.items
# Copy data
# insert it in self.patched_items
def __getitem__(self, name: str):
if section := self.patched_items.get(name):
return section
else:
return super().__getitem__(name)
def parse():
... # Do something
this is just a rough draft of such a manager class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like the idea, initially that is kind of what I had in mind as well but I lacked (still do) some understanding of the language to do it in a nice manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want me to look into that a bit?
for name, section in self.pe.patched_sections.items(): | ||
if section.virtual_address == prev_va: | ||
continue | ||
|
||
pointer_to_raw_data = utils.align_int(integer=prev_ptr + prev_size, blocksize=self.pe.file_alignment) | ||
virtual_address = utils.align_int(integer=prev_va + prev_vsize, blocksize=self.pe.section_alignment) | ||
|
||
if section.virtual_address < virtual_address: | ||
"""Set the virtual address and raw pointer of the section to the new values, but only do so if the | ||
section virtual address is lower than the previous section. We want to prevent messing up RVA's as | ||
much as possible, this could lead to binaries that are a bit larger than they need to be but that | ||
doesn't really matter.""" | ||
self.pe.patched_sections[name].virtual_address = virtual_address | ||
self.pe.patched_sections[name].pointer_to_raw_data = pointer_to_raw_data | ||
|
||
prev_ptr = pointer_to_raw_data | ||
prev_size = section.size_of_raw_data | ||
prev_va = virtual_address | ||
prev_vsize = section.virtual_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the Resource
class, it doesn't feel like this class should be responsible for patching all the other sections.
to help with a lot of these style issues, you could check out the new linting/formatting we plan to use. It will probably also warn you about stuff I missed: fox-it/dissect.util#52 |
* Refactored based on comments given * Ran `ruff` for formatting * Applied style and formatting on `dissect.executable.elf` as well (hope you don't mind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered your comments a bit, if you need help with any of the changes I requested do not hesitate to ask!
First version for the dissect PE format parser. Currently the parser supports standard PE files, as well as some patching of executables and building an executable from scratch. It is expected there will be some issues that will be found during usage and it will need quite some refactoring. The API as it currently exists might need some revising too :)