-
Notifications
You must be signed in to change notification settings - Fork 227
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #187 from wader/present-bts2022
doc: Add fq bts2022 presentation
- Loading branch information
Showing
21 changed files
with
3,037 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
fq presentation from Binary Tools Summit 2022 https://binary-tools.net/ | ||
|
||
[fq-bts2022-v1.pdf](fq-bts2022-v1.pdf) | ||
|
||
Will update with link to recording when availabe. | ||
|
||
Was done at the time of ~fq 0.0.5, things might have changed since. | ||
|
||
How to build: | ||
|
||
``` | ||
go install golang.org/x/tools/cmd/present | ||
present -notes -content doc/presentations/bts2022 -base ~/go/pkg/mod/golang.org/x/[email protected]/cmd/present | ||
``` | ||
|
||
``` | ||
./usage.sh | ansisvg > usage.svg | ||
``` | ||
|
||
Export to PDF via browser. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
fq is inspired by the well known jq tool and language and allows you | ||
to work with binary formats the same way you would using jq. In | ||
addition it can also present data similar to a hex viewer, transform, | ||
slice and concatenate binary data, supports nested formats and has an | ||
interactive REPL with auto-completion. | ||
|
||
It was originally designed to query, inspect and debug codecs and | ||
metadata in media files and containers like mp4, flac, mp3, jpeg. But | ||
has since been extended to support a variety of formats like | ||
executables, packet captures including TCP reassembly and | ||
serialization formats like ASN1 BER, Avro, CBOR, protobuf and a lot | ||
more. | ||
|
||
In summary it aims to be something like jq, hexdump, dd and gdb | ||
combined into one. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
func avcHdrParameters(d *decode.D) { | ||
cpbCnt := d.FieldUFn("cpb_cnt", uEV, scalar.UAdd(1)) | ||
d.FieldU4("bit_rate_scale") | ||
d.FieldU4("cpb_size_scale") | ||
d.FieldArray("sched_sels", func(d *decode.D) { | ||
for i := uint64(0); i < cpbCnt; i++ { | ||
d.FieldStruct("sched_sel", func(d *decode.D) { | ||
d.FieldUFn("bit_rate_value", uEV, scalar.UAdd(1)) | ||
d.FieldUFn("cpb_size_value", uEV, scalar.UAdd(1)) | ||
d.FieldBool("cbr_flag") | ||
}) | ||
} | ||
}) | ||
d.FieldU5("initial_cpb_removal_delay_length", scalar.UAdd(1)) | ||
d.FieldU5("cpb_removal_delay_length", scalar.UAdd(1)) | ||
d.FieldU5("dpb_output_delay_length", scalar.UAdd(1)) | ||
d.FieldU5("time_offset_length") | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,238 @@ | ||
# fq | ||
|
||
jq for binary formats | ||
|
||
Mattias Wadman | ||
[email protected] | ||
https://github.com/wader/fq | ||
@mwader | ||
|
||
## Background | ||
|
||
.html style.html | ||
|
||
- Use various tools to extract data | ||
- ffprobe, gm identify, mp4dump, mediainfo, wireshark, one off programs, ... | ||
- Convert to usable format and do queries | ||
- jq, grep, sqlite, sort, awk, sed, one off programs, ... | ||
- Digging into and slicing binaries | ||
- Hexfiend, hexdump, dd, cat, one off programs, ... | ||
|
||
|
||
## Wishlist | ||
|
||
"Want to see everything about this picture except the picture" | ||
|
||
- A very verbose version of file(1) | ||
- gdb for files | ||
- Select and query things using a language | ||
- Make all parts of a file symbolically addressable | ||
- Nested formats and binaries | ||
- Convenient bit-oriented decoder DSL | ||
|
||
|
||
## Experiments and prototypes | ||
|
||
- Decoder DSL | ||
- TCL, lisp, tengo, Starlark, JavaScript, Go | ||
- Query language | ||
- JSONPath, SQL, jq, JavaScript | ||
- How to use | ||
- IR-JSON: `fq file | jq ... | fq` | ||
- Extend existing project | ||
- Decode and query in same tool | ||
|
||
|
||
## Result | ||
|
||
Go | ||
|
||
- Tests showed fast enough to decode big files | ||
- Found gojq | ||
- Previous good experience | ||
- Good tooling | ||
|
||
|
||
## jq | ||
|
||
"The JSON indenter" | ||
|
||
- JSON in/out | ||
- Syntax kind of a superset of JSON with same types | ||
- Functional language based on generators and backtracking | ||
- Expressions can return or "output" zero, one or more values | ||
- No more outputs backtracks | ||
- Implicit input and output similar to shell pipes | ||
- Extraordinary iteration and combinatorial abilities | ||
- Great for traversing tree structures | ||
|
||
|
||
## Examples | ||
|
||
.code jq1 | ||
|
||
## Examples | ||
|
||
.code jq2 | ||
|
||
## Examples | ||
|
||
.code jq3 | ||
|
||
|
||
## Examples | ||
|
||
.code jq4 | ||
|
||
|
||
## fq | ||
|
||
"The binary indenter" | ||
|
||
- Superset of jq | ||
- Re-implements most of jq's CLI interface | ||
- 83 input formats, 22 supports probe | ||
- Additional standard library functions | ||
- Additional types that act as standard jq types but has special abilities | ||
- _Decode value_ has bit range, actual and symbolic value, description, ... | ||
- _Binary_ has a unit size, bit or bytes, and can be sliced | ||
- Output fancy hexdump, JSON and binary | ||
- Interactive REPL with completion and sub-REPL support | ||
|
||
|
||
## | ||
|
||
.image formats.svg _ 1024 | ||
|
||
## Usage | ||
|
||
- Basic usage | ||
- `fq . file`, `cat file | fq` | ||
- Multiple input files | ||
- `fq 'grep_by(format == "exif")' *.png *.jpeg` | ||
- Hexdump, JSON and binary output | ||
- `fq '.frames[10] | d' file.mp3` | ||
- `fq '[grep_by(format == "dns").questions[].name.value]' file.pcap` | ||
- `fq 'first(grep_by(format == "jpeg")) | tobytes' file > file.jpeg` | ||
- Interactive REPL | ||
- `fq -i . *.png` | ||
|
||
|
||
## | ||
|
||
.background %3D | ||
.image usage.svg _ 900 | ||
|
||
|
||
## fq specific functions | ||
|
||
- Standard library | ||
- `streaks`, `count`, `delta`, `chunk`, `diff`, `grep`, `grep_by`, ... | ||
- `toradix`, `fromradix`, `hex`, `base64`, ... | ||
- Decode value | ||
- `display` (alias `d`, `dv`, `da` ...) | ||
- `parent`, `format`, ... | ||
- `tobytes`, `tovalue`, `toactual`, ... | ||
- `torepr`, ... | ||
- Binary | ||
- Regexp functions `test`, `match`, ... | ||
- Decode functions `probe`, `mp3_frame`, ... | ||
|
||
|
||
## Binary and binary array | ||
|
||
- A binary is created using `tobits`, `tobytes`, `tobitsrange` or `tobytesrange`. | ||
- From decode value `.frames[1] | tobytes` | ||
- String or number `"hello" | tobits` | ||
- Binary array `[0xab, ["hello", .name]] | tobytes` | ||
- Can be sliced using normal jq slice syntax. | ||
- `"hello" | tobits[8:8+16]` are the bits for `"el"` | ||
- Can be decoded | ||
- `[tobytes[-10:], 0, 0, 0, 0] | flac_frame` | ||
|
||
|
||
## Example queries | ||
|
||
- Slice and decode | ||
- `tobits[8:8+8000] | mp3_frame | d` | ||
- `match([0xff,0xd8]) as $m | tobytes[$m.offset:] | jpeg` | ||
- ASN1 BER, CBOR, msgpack, BSON, ... has `torepr` support | ||
- `fq -d cbor torepr file.cbor` | ||
- `fq -d msgpack '[torepr.items[].name]' file.msgpack` | ||
- PCAP with TCP reassembly, look for GET requests | ||
- `fq 'grep("GET .*")' file.pcap` | ||
- Parent of scalar value that includes bit 100 | ||
- `grep_by(scalars and in_bits_range(100)) | parent` | ||
|
||
|
||
## Use as script interpreter | ||
|
||
.code fqscript | ||
|
||
|
||
## Use as script interpreter | ||
|
||
.code fqscriptout | ||
|
||
|
||
## Implementation | ||
|
||
- Library of jq function implemented in Go | ||
- Decoders, decode value, binary, bit reader, IO, tty, ... | ||
- CLI and REPL is mostly written in jq | ||
``` | ||
( open | ||
| decode | ||
| if $repl then repeat(read as $expr | eval($expr) | print) | ||
else eval($arg) | print | ||
end | ||
) | ||
``` | ||
- All current decoders in Go | ||
- Uses a forked version of gojq | ||
- Helped add native functions and iterators support | ||
- JQValue interface, bin/hex/oct literals, reflection, query AST functions, ... | ||
|
||
## Decode API | ||
|
||
SPS HRD parameters from ITU-T H.264 specification | ||
|
||
.code avc_sps_hdr_params.go | ||
|
||
## Decode API | ||
|
||
.image avc_sps_hdr_params.png _ 900 | ||
|
||
|
||
## Decode API | ||
|
||
Formats can use other formats. Simplified version of mp3 decoder: | ||
|
||
.code mp3.go | ||
|
||
|
||
## Future | ||
|
||
- Declarative decoding like kaitai struct, decoder in jq | ||
- Nicer way to handle checksums, encoding, validation etc | ||
- Schemas for ASN1, protobuf, ... | ||
- Better support for modifying data | ||
- More formats like tls, http, http2, grpc, filesystems, ... | ||
- Encoders | ||
- More efficient, lazy decoding, smarter representation | ||
- GUI | ||
- Streaming input, read network traffic `tap("eth0") | select(...)`? | ||
- Hope for more contributors | ||
|
||
|
||
## Thanks and useful tools | ||
|
||
- @itchyny for gojq | ||
- Stephen Dolan and others for jq | ||
- HexFiend | ||
- GNU poke | ||
- Kaitai struct | ||
- Wireshark | ||
- [vscode-jq](https://github.com/wader/vscode-jq) | ||
- [jq-lsp](https://github.com/wader/jq-lsp) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#!/usr/bin/env fq -d mp4 -f | ||
|
||
( first(.boxes[] | select(.type == "moov")?) | ||
| first(.boxes[] | select(.type == "mvhd")?) as $mvhd | ||
| { time_scale: $mvhd.time_scale, | ||
duration: ($mvhd.duration / $mvhd.time_scale), | ||
tracks: | ||
[ .boxes[] | ||
| select(.type == "trak") | ||
| [("mdhd", "stsd", "elst") as $t | first(grep_by(.type == $t))] as [$mdhd, $stsd, $elst] | ||
| { data_format: $stsd.boxes[0].type, | ||
media_scale: $mdhd.time_scale, | ||
edit_list: | ||
[ $elst.entries[] | ||
| { track_duration: (.segment_duration / $mvhd.time_scale), | ||
media_time: (.media_time / $mdhd.time_scale) | ||
} | ||
] | ||
} | ||
] | ||
} | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
$ ./editlist file.mp4 | ||
{ | ||
"duration": 60.095, | ||
"time_scale": 600, | ||
"tracks": [ | ||
{ | ||
"data_format": "mp4a", | ||
"edit_list": [ | ||
{ | ||
"media_time": 0, | ||
"track_duration": 60.095 | ||
} | ||
], | ||
"media_scale": 22050 | ||
}, | ||
{ | ||
"data_format": "avc1", | ||
"edit_list": [ | ||
{ | ||
"media_time": 0, | ||
"track_duration": 60.095 | ||
} | ||
... |
Oops, something went wrong.