Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BINARY] clang cannot compile those assembly, neither from gcc nor msvc program #34

Open
swang206 opened this issue Aug 25, 2021 · 32 comments
Assignees
Labels
binary fails DDisasm fails to correctly disassemble a binary

Comments

@swang206
Copy link

cons.asm:25375:13: error: unknown use of instruction mnemonic without a size suffix
jmp $L_14001135e
^
cons.asm:25378:13: error: unknown use of instruction mnemonic without a size suffix
mov R8,R15
^
cons.asm:25379:27: error: unexpected token in argument list
lea RCX,QWORD PTR [RSP+72]
^
cons.asm:25380:13: error: unknown use of instruction mnemonic without a size suffix
mov RDX,RDI
^
cons.asm:25383:13: error: unknown use of instruction mnemonic without a size suffix
cmp EAX,-1
^
cons.asm:25384:16: error: invalid operand for instruction
je $L_14001139b
^~~~~~~~~~~~

@swang206 swang206 added the binary fails DDisasm fails to correctly disassemble a binary label Aug 25, 2021
@kwarrick
Copy link
Contributor

kwarrick commented Sep 1, 2021

Hello,

Please provide as much of the following information as possible:

  • Please attach the binary to this issue.
  • What is the version of ddisasm?
  • How can we reproduce? Please paste the command line used to invoke ddisasm and clang.

@swang206
Copy link
Author

swang206 commented Sep 1, 2021

Hello,

Please provide as much of the following information as possible:

  • Please attach the binary to this issue.
  • What is the version of ddisasm?
  • How can we reproduce? Please paste the command line used to invoke ddisasm and clang.

master.

It does not work for clang

@kwarrick
Copy link
Contributor

kwarrick commented Sep 1, 2021

You are not able to disassemble clang itself, or you are not able to reassemble a binary with clang?

@swang206
Copy link
Author

swang206 commented Sep 1, 2021

reassemble a binary with clang

@kwarrick
Copy link
Contributor

kwarrick commented Sep 1, 2021

PE or ELF, 32-bit or 64-bit?

@swang206
Copy link
Author

swang206 commented Sep 1, 2021

PE or ELF, 32-bit or 64-bit?

Both PE and ELF 64 bit.

I did not try 32-bit.

@kwarrick
Copy link
Contributor

kwarrick commented Sep 1, 2021

Currently, for Windows binaries only the MASM assembly syntax is supported by ddisasm/gtirb-pprinter. You will have to use a MASM-compatible assembler such as ML64 or UASM.

For example,

$ cd ddisasm/examples/ex1
$ cl ex.c
$ ddisasm --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64

@swang206
Copy link
Author

swang206 commented Sep 1, 2021

Currently, for Windows binaries only the MASM assembly syntax is supported by ddisasm/gtirb-pprinter. You will have to use a MASM-compatible assembler such as ML64 or UASM.

For example,

$ cd ddisasm/examples/ex1
$ cl ex.c
$ ddisasm --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64

but why Linux executable does not work with clang?

@aeflores
Copy link
Collaborator

aeflores commented Sep 1, 2021

It is hard to reproduce the problem if we don't have access to the binary. By the look of those error messages, it could be that clang expects AT&T syntax instead of INTEL syntax, which is the default.
I would try specifying the AT&T syntax in the gtirb-pprinter and see if that works better, e.g.:

ddisasm example --ir example.gtirb
gtirb-pprinter example.gtirb --syntax att --asm example.asm
clang example.asm -o example_rewritten

Let us know if that helps

@kwarrick
Copy link
Contributor

kwarrick commented Sep 1, 2021

We use gcc for reassembly. For the few examples I just tried, clang works. I believe there are some subtle differences in the syntax clang supports, but gcc and clang should mostly be compatible. There are changes in the works that will allow gtirb-pprinter to target multiple assemblers, but for now clang support must be reconciled by the user.

@swang206
Copy link
Author

swang206 commented Sep 2, 2021

We use gcc for reassembly. For the few examples I just tried, clang works. I believe there are some subtle differences in the syntax clang supports, but gcc and clang should mostly be compatible. There are changes in the works that will allow gtirb-pprinter to target multiple assemblers, but for now clang support must be reconciled by the user.

hi
can you help me address this issue? I try to reassemble notepad++.exe and it cannot find symbols
https://github.com/swang206/npp-gtirb-fail

The repository contains IR, asm, and original notepad++.exe file.

I tried to use the binary downloaded from the official and it fails. I compile notepad++ by myself again and it still fails for the same reason. the linker cannot find __imp_COMCTL32@17 symbol, even the extern function is shown in the assembly.
image

@kwarrick
Copy link
Contributor

kwarrick commented Sep 6, 2021

You should be able to generate a .LIB file to satisfy the linker with one additional command-line argument:

$ ddisasm --asm npp.asm --generate-import-libs notepad++.exe

This requires that LIB.exe be on the PATH, but that should already true if you can use ML64.exe. You will see that it generates a COMCTL32.lib file in the local directory. The MSVC linker will find these automatically, and you should be able to reassemble correctly.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

You should be able to generate a .LIB file to satisfy the linker with one additional command-line argument:

$ ddisasm --asm npp.asm --generate-import-libs notepad++.exe

This requires that LIB.exe be on the PATH, but that should already true if you can use ML64.exe. You will see that it generates a COMCTL32.lib file in the local directory. The MSVC linker will find these automatically, and you should be able to reassemble correctly.

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

image

image

no comctl32.lib

@kwarrick
Copy link
Contributor

kwarrick commented Sep 7, 2021

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

Oh! This is an actual bug I can fix. I've just checked and the .LIB files are only generated when you specify the --asm argument. In you screenshot you only use --ir. Well, I'm not sure this is a bug per se, but it is tricky. At the very least, it should be a warning.

Add --asm npp.asm and it will generate the COMCTL32.lib.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

Oh! This is an actual bug I can fix. I've just checked and the .LIB files are only generated when you specify the --asm argument. In you screenshot you only use --ir. Well, I'm not sure this is a bug per se, but it is tricky. At the very least, it should be a warning.

Add --asm npp.asm and it will generate the COMCTL32.lib.

is that possible to generate .lib directly on Linux instead of on windows? ddisasm runs extremely slow on windows.

@kwarrick
Copy link
Contributor

kwarrick commented Sep 7, 2021

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.


ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.

ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

great. i have my lld installed

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

npp.zip
the exe does not run at all

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

image
I tried notepad3. It does not work either.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.

ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

hello. I tried 4 different windows software, none of them works.

Can you tell me a windows software that works? So I can use that for my work.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

Even helloworld does not work

image

What's wrong here?

@kwarrick
Copy link
Contributor

kwarrick commented Sep 7, 2021

Assuming helloworld.asm is the output of ddisasm, you will have to use /entry:__EntryPoint. By default, MSVC will statically link the C runtime, which means main is most likely not the entry point of the PE you are disassembling. For this reason, ddisasm actually creates the __EntryPoint label for you.

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

/entry:__EntryPoint

what if i am using /MD??

@kwarrick
Copy link
Contributor

kwarrick commented Sep 7, 2021

You can still use __EntryPoint, or you can insert the following line into helloworld.asmso /entry:main will work:

PUBLIC main

@swang206
Copy link
Author

swang206 commented Sep 7, 2021

You can still use __EntryPoint, or you can insert the following line into helloworld.asmso /entry:main will work:

PUBLIC main

how to deal with those syntax errors or conflicts etc??

@kwarrick
Copy link
Contributor

kwarrick commented Sep 7, 2021

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

@swang206
Copy link
Author

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

Any guideline on how to compile gtirb, targeting windows. Do you use cross-compilation?

@swang206
Copy link
Author

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

hi.

There are some issues with the assembly for ml64.

  1. ml64 does not support int1, int3, ud1 instructions. do not know whether they can be replaced with ud2.
  2. rcl BYTE PTR [RSI+81949] are not legal instructions.

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

@kwarrick
Copy link
Contributor

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

I agree. It is on my list.

how does ddisasm work with windows executable with resource file?

We actually just merged changes last week to improve this. In short, you can now pass the --generate-resources argument to ddisasm to create a .RES file in your local directory, which can be passed to the linker.

See examples/ex_rsrc for a simple test:

$ ddisasm --generate-resources --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64 ex.res

@swang206
Copy link
Author

swang206 commented Sep 18, 2021

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

I agree. It is on my list.

how does ddisasm work with windows executable with resource file?

We actually just merged changes last week to improve this. In short, you can now pass the --generate-resources argument to ddisasm to create a .RES file in your local directory, which can be passed to the linker.

See examples/ex_rsrc for a simple test:

$ ddisasm --generate-resources --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64 ex.res

Hi kwarrick. I try to disassemble 7zip. It works but the windows does not pop up at all. Why?
7zip_2.zip

Here is the IR file, assembly, and executables.

What I found is that a lot of windows GUI executables just flash and exit after disassembly. Can you have a look at it??

@kwarrick
Copy link
Contributor

kwarrick commented Sep 19, 2021

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

After looking at the assembly output from Notepad3.exe, I have determined that is this a binary that has had the .rdata section merged with the .text section.

I am not entirely sure of the motivation, but it appears that a lot PE32 binaries (32-bit) have merged data and code sections.The MSVC compiler provides an option to do this:

$ cl ex.c /link /merge:.rdata=.text

When you look at the beginning of the .text section you will see a huge list of addresses, followed by string constants, PE data directory structures, and lots of other data regions.

As ddisasm was originally developed against ELF binaries, the only data-in-code analysis required thus far has been for jump tables within the code section. To correctly disassemble binaries with merged data sections, I have been working on a branch that introduces more complex data analysis logics. I will update this issue when we merge that work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary fails DDisasm fails to correctly disassemble a binary
Projects
None yet
Development

No branches or pull requests

5 participants
@eschulte @kwarrick @aeflores @swang206 and others