Dearbitrary #187

LeoDog896 · 2024-08-04T04:39:21Z

Resolves #44.

I would greatly appreciate feedback on naming the structs & associated methods with UnstructuredBuilder.

dearbitrary intentionally builds Unstructured in reverse to counter the problems encountered in WIP: Add dearbitrary functionality to turn an instance into its arbitrary byte sequence #94 (the slice length optimizations) by ensuring that size is built in reverse.
kani is used to fastly assure primitive dearbitary/rearbitrary cases. I am still proof-checking the other primitives and will write fuzzing tests for the non-verifiable implementations.

Todo:

Write rest of kani proofs
Set up fuzzing on non-verifiable methods (or takes too long)
UnstructuredBuilder Documentation
Dearbitrary Documentation
Decomment u16 range test - currently takes 10 minutes to verify, so I've commented it for the time being to save on my testing time.
Derive macro

Notes:

Should dearbitrary be hidden behind a feature?
Int is currently a breaking API change. Can this be fixed without manually implementing from/to le bytes?
While writing this, I noticed that the f32/f64 representation isn't representable cross-platform if MIPS is used as the architecture set.

fitzgen · 2024-08-09T18:37:49Z

Apologies for how long its taken me to respond to this PR.

I'm concerned about the complexity that this feature brings. I'm not convinced that it is worth it, just to be able to seed a corpus.

If seeding a corpus is an important use case, it can be done today by changing how the structure-aware fuzzing is done from a generative paradigm to a mutation paradigm:

take bytes as the input to the fuzz target
try to deserialize the bytes into your input type with bincode::deserialize::<MyInputType>(fuzzer_input) or similar (protobuf, etc...)
- if the deserialization fails, continue to the next test case
- if it succeeds, then run the structured input through your oracles, same as you otherwise would with the arbitrary crate

additionally define a fuzz_mutator! that does something like

let mut input = bincode::deserialize::<MyInputType>(input_bytes)
    .unwrap_or_else(|| MyInputType::default());

// Mutate a test case. Not any more difficult to implement
// than a by-hand `Arbitrary` implementation.
input.mutate(seed);

return bincode::serialize(input);

now you can seed your corpus by manually building a bunch of MyInputTypes and then bincode::serialize them into the corpus directory

(But backing up: It should also be noted that randomly generating a corpus before you start fuzzing isn't going to be expected to do any better than incrementally building the corpus from nothing. Seeding the corpus is only useful if you already have inputs that you know are interesting for other reasons, for example they've triggered bugs/crashes before or are Real World snippets of your programming language or etc...)

fitzgen · 2024-08-09T18:42:08Z

In general, the mutation paradigm has another benefit over the generative paradigm as well: if you add knobs to only do shrinking/simplifying mutations, then it is trivial to build a test case minimizer on top of that which performs better than cargo fuzz tmin. And for things like programming languages, if you add a semantics-preserving mutation mode, then you can do differential fuzzing where you check the correctness of your various compiler passes and optimizations like

let input = the_fuzzer_input();
let result = run(&input);

let alt_input = input.mutate_preserving_semantics();
let alt_result = run(&alt_input);

assert_eq!(
    result,
    alt_result,
    "running two semantically-equivalent programs should produce the same results",
);

fitzgen · 2024-08-09T18:43:03Z

(we should probably have a crate that is the mutation-paradigm-equivalent of the arbitrary crate in the rust-fuzz org)

LeoDog896 · 2024-08-10T15:48:20Z

By the way, my use of seeding was incorrect. I meant more for large arbitrary-derived structures, as developing an initial corpus is painstaking. Still, without it, the initial fuzzing warmup takes longer. However, the inconvenience also means that some libraries go without a corpus for those types of structures.

While I do have a limited understanding of fuzzers, aren't most fuzzers mutative instead of generative already? Wouldn't methods like mutate or mutate_preserving_semantics have to explore much more slowly, given that their inputs depend on the structure of input?

LeoDog896 added 7 commits July 25, 2024 23:51

feat: initial dearbitrary trait

4058174

feat(dearbitrary): implement base len support

37a8054

test: setup kani

55f058c

test: kani

e3209bb

test(kani): lossless, nonzero

257bddf

docs: document UnstructuredBuilder functions

3da01a7

chore: add kani to allowed cfgs

63fb598

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dearbitrary #187

Dearbitrary #187

LeoDog896 commented Aug 4, 2024 •

edited

Loading

fitzgen commented Aug 9, 2024

fitzgen commented Aug 9, 2024

fitzgen commented Aug 9, 2024

LeoDog896 commented Aug 10, 2024 •

edited

Loading

Dearbitrary #187

Are you sure you want to change the base?

Dearbitrary #187

Conversation

LeoDog896 commented Aug 4, 2024 • edited Loading

fitzgen commented Aug 9, 2024

fitzgen commented Aug 9, 2024

fitzgen commented Aug 9, 2024

LeoDog896 commented Aug 10, 2024 • edited Loading

LeoDog896 commented Aug 4, 2024 •

edited

Loading

LeoDog896 commented Aug 10, 2024 •

edited

Loading