Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzing tool for WASM #48

Open
12 tasks
Jacarte opened this issue Jun 2, 2020 · 20 comments · Fixed by #59
Open
12 tasks

Fuzzing tool for WASM #48

Jacarte opened this issue Jun 2, 2020 · 20 comments · Fixed by #59
Assignees

Comments

@Jacarte
Copy link
Collaborator

Jacarte commented Jun 2, 2020

Use SWAM as the core to create a full-fledge fuzzer for WASM. As a big picture here are the milestone to achieve it:

Depends on the WASM coverage tool, see #54

Reference implementations:

Medium priority todos:

  • add support for more than the 4 primitive data-types Fuzzing tool for WASM #48 (comment)
  • activate CI for wasm-fuzzer, add docker build in CI and make sure that it fails if gcc fails
  • Add support for textual report of crashing inputs created by AFL
  • Write shell script to run SWAM a specific input file created by AFL (a test harness)

Low priority todos:

  • Use the server as a top level project in Slumps, use SWAM as a dependency (instead of a fork). Use sbt, so Scala/Java dependencies can be saved in image (not in volume like in current implementation with Mill)
  • Use proper logging, to avoid redundant prints, introduce logging macro
  • Add support for seeding AFL with several input sets (was Make it possible for WASM_ARG_LIST > WASM_ARG_TYPES_LIST, so that AFL has multiple possible starting inputs)
  • Use fs2-io for the Socket server instead of standard Socket classes (nio). Very difficult imo - also potentially not necessary if SWAM will only be a dependency.
  • Analyze AFL output —> Write code that generates optimal test-cases for WASM in Javascript
  • support DWARF, see add support for DWARF debugging symbols satabin/swam#94
  • add proper protocol between SWAM and AFL (protocolbuffer, messagepack)
  • use the literals available in the WASM binary as seed in AFL
@tareq97-zz
Copy link

Hi @Jacarte ,

tareq97-zz/swam@0e62af4

commit with respect to coverage tool. with two test cases for the coverage method.

@Jacarte
Copy link
Collaborator Author

Jacarte commented Jun 11, 2020

Hi @tareq97 and @olapiv

Now your code is in the branch feature/opt-in https://github.com/KTH/swam/tree/feature/opt-in, lets work on this branch. I will do the changes that I discussed with Lucas.

The new PR to be merged with master, KTH/swam#6

@Jacarte
Copy link
Collaborator Author

Jacarte commented Jun 15, 2020

@monperrus
Copy link
Collaborator

@monperrus
Copy link
Collaborator

monperrus commented Jul 15, 2020

Potential benchmark for evaluation the fuzzer: the 26 WASM binaries (98,924 functions) of https://www.unibw.de/patch/papers/usenixsecurity20-wasm.pdf

@olapiv
Copy link
Collaborator

olapiv commented Jul 21, 2020

Here are the technical details of how AFL works: https://github.com/google/AFL/blob/master/docs/technical_details.txt

It's very well explained - I especially recommend reading part 1 ("Coverage measurements").

@olapiv
Copy link
Collaborator

olapiv commented Jul 29, 2020

Done:

  • add in documentation the link to the spec of the byte array that AFL expects
  • Git cleanup
    • @olapiv rename and move branch: Swam server in feature/swam-server on KTH/SWAM (branch to be created)
    • @olapiv update the git submodule in Slumps
  • @Jacarte modify the Swam server to return the actual path coverage in feature/swam-server on KTH/SWAM, see WAKOKO: coverage tool for WASM #54
  • Update README for Fuzzer (switch from docker-compose to single Docker container + new folder structure
  • Build mechanism of running multiple processes within one Docker image, so that signals are handled correctly
  • Implement Slave-Master logic for multiple AFL instances / Docker containers. Info parallel fuzzing
  • Return the actual branch coverage
  • remove the mandatory WAFL environment config specifying the signature (input params, types), replace it with automated inference of signatures from the WASM binary

Moving todos at the point

@olapiv
Copy link
Collaborator

olapiv commented Jul 29, 2020

Current work is here:
https://github.com/KTH/slumps/tree/wasm-fuzzer/wasm-fuzzer

Pull request is here:
#53

@olapiv
Copy link
Collaborator

olapiv commented Aug 4, 2020

Here are a couple of thoughts that I am currently having regarding next steps. Some of it may not make sense, some of it may be obvious.

  • The current implementation does not support using strings as input parameters.
  • SWAM works with exporting strings (see SWAM examples) - can you import strings as well @Jacarte?
  • The WASM API for working with strings may change quite a lot in the future with interfaces.
  • Here’s a simple example of using v8 to execute Javascript that runs a .wasm file.
  • Using this setup with v8 would make us independent of fast-changing WASM APIs - v8 is most likely to implement them before SWAM.
  • Javascript could be used to load strings into WASM memory. For now like this.
  • v8 is written in C++ - the language that AFL is meant for.
  • Emscripten can compile C++ with instrumentation that emits debug information in DWARF format. DWARF is also how you would debug WASM in Google Chrome.
  • Look into: what coverage instrumentation data does AFL read? DWARF format by any chance?
  • kcov: A library that converts DWARF format to coverage. An example.
  • AFL requires path coverage - possible to convert data generated by kcov into path coverage?
  • Using either AFL <— v8 <— Javascript <— WASM
 or AFL <— kcov <— v8 <— Javascript <— WASM
 would make it unnecessary to use any sockets, we would be independent of any WASM API changes and we could possibly use the standard AFL pipeline (not sure though). WASM is also most likely executed in a JS environment, so the fuzzer may also be better at catching relevant errors.

@monperrus
Copy link
Collaborator

Ack, thanks for the update. Is there a need to revise to todo list accordingly?

@olapiv
Copy link
Collaborator

olapiv commented Aug 5, 2020

Just did. It's all still a very vague idea though, so it's a bit difficult to pinpoint the exact next steps. I'm just researching for now, so I'm (more) sure that whatever we do next is viable.

Would be nice to hear what you guys think about it though! As far as I know the concept could also be an entire waste of time.

@monperrus
Copy link
Collaborator

monperrus commented Aug 5, 2020

Not sure to see the underlying concept behind the bullets. Do you mean "using v8"?

@olapiv
Copy link
Collaborator

olapiv commented Aug 5, 2020

Yes, exactly

@monperrus
Copy link
Collaborator

The question of using v8 versus using Swam is hard. There are pros and cons in both cases and we've made a strategic decision some time ago.

Now, for the fuzzer, we may use v8 again in the future. But in the timeframe of your internship, and given that only a few weeks remain, I would suggest to consolidate as much as possible the Swam solution, with as much as possible in Swam's master and with top code and documentation merged here in Slumps (and adding DWARF support in SWAM?).

@olapiv
Copy link
Collaborator

olapiv commented Aug 17, 2020

Using non-number types with SWAM

  • Option A: Implement WASM Interface Types into SWAM

    • Excellent article on WASM Interface Types by Lin Clark
    • Interface Types are currently still a Phase 1 Proposal
    • Interface Types Proposal
    • Compiling with Interface Types seems only to work with wasm-bindgen (Rust) and not Emscripten/C++ yet.
    • Also executing interface types is still somewhat unstable with wasmtime and more in the making, see here.
    • -->Seems like unrealistic for now
  • Option B: Writing objects into WASM memory

    • Requires generating Scala "glue code", adjacent to Emscripten's or wasm-bindgen's JS glue code.
    • -->Too complicated? Looking into this now.

@monperrus
Copy link
Collaborator

monperrus commented Aug 18, 2020

FYI, the latest coverage code is in branch path_coverage https://github.com/KTH/swam/tree/feature/path_coverage

@monperrus
Copy link
Collaborator

The fuzzing code depends on a branch on olapiv through a git submodule https://github.com/olapiv/swam/tree/759e41a9cd778981c2009764a2236b22c2975646

@Jacarte
Copy link
Collaborator Author

Jacarte commented Aug 18, 2020

AFL coverage info modification examples AFLFast, AFLGo and AFLSmart

@monperrus monperrus linked a pull request Aug 29, 2020 that will close this issue
@monperrus
Copy link
Collaborator

per our discussion with @olapiv today added one todo at the top "use the literals available in the WASM binary as seed in AFL"

@Jacarte
Copy link
Collaborator Author

Jacarte commented Dec 29, 2020

In order to implement the socket protocol from AFL as a websocket protocol:
A curated list of WebSockets related principles and technologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants