Skip to content

Commit

Permalink
Merge pull request #187 from wader/present-bts2022
Browse files Browse the repository at this point in the history
doc: Add fq bts2022 presentation
  • Loading branch information
wader authored Mar 8, 2022
2 parents 0ed6b25 + b97776c commit c298ed7
Show file tree
Hide file tree
Showing 21 changed files with 3,037 additions and 6 deletions.
11 changes: 5 additions & 6 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ linters:
- predeclared
- tagliatelle

run:
skip-dirs:
- dev
- doc

linters-settings:
misspell:
ignore-words:
Expand All @@ -37,9 +42,3 @@ linters-settings:
# allow md5
- G401
- G501
issues:
exclude-rules:
- path: dev/.*\.go
linters:
# ignore main re-declared errors
- typecheck
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,11 @@ Basic usage is `fq . file`.

For details see [usage.md](doc/usage.md)

## Presentations

- "fq - jq for binary formats" at [Binary Tools Summit 2022](https://binary-tools.net/summit.html) - (recording soon) - [slides](doc/presentations/bts2022/fq-bts2022-v1.pdf)


## Install

Use one of the methods listed below or download [release](https://github.com/wader/fq/releases) for your platform. Unarchive it and move the executable to `PATH` etc.
Expand Down
21 changes: 21 additions & 0 deletions doc/presentations/bts2022/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
fq presentation from Binary Tools Summit 2022 https://binary-tools.net/

[fq-bts2022-v1.pdf](fq-bts2022-v1.pdf)

Will update with link to recording when availabe.

Was done at the time of ~fq 0.0.5, things might have changed since.

How to build:

```
go install golang.org/x/tools/cmd/present
present -notes -content doc/presentations/bts2022 -base ~/go/pkg/mod/golang.org/x/[email protected]/cmd/present
```

```
./usage.sh | ansisvg > usage.svg
```

Export to PDF via browser.

15 changes: 15 additions & 0 deletions doc/presentations/bts2022/abstract.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
fq is inspired by the well known jq tool and language and allows you
to work with binary formats the same way you would using jq. In
addition it can also present data similar to a hex viewer, transform,
slice and concatenate binary data, supports nested formats and has an
interactive REPL with auto-completion.

It was originally designed to query, inspect and debug codecs and
metadata in media files and containers like mp4, flac, mp3, jpeg. But
has since been extended to support a variety of formats like
executables, packet captures including TCP reassembly and
serialization formats like ASN1 BER, Avro, CBOR, protobuf and a lot
more.

In summary it aims to be something like jq, hexdump, dd and gdb
combined into one.
18 changes: 18 additions & 0 deletions doc/presentations/bts2022/avc_sps_hdr_params.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
func avcHdrParameters(d *decode.D) {
cpbCnt := d.FieldUFn("cpb_cnt", uEV, scalar.UAdd(1))
d.FieldU4("bit_rate_scale")
d.FieldU4("cpb_size_scale")
d.FieldArray("sched_sels", func(d *decode.D) {
for i := uint64(0); i < cpbCnt; i++ {
d.FieldStruct("sched_sel", func(d *decode.D) {
d.FieldUFn("bit_rate_value", uEV, scalar.UAdd(1))
d.FieldUFn("cpb_size_value", uEV, scalar.UAdd(1))
d.FieldBool("cbr_flag")
})
}
})
d.FieldU5("initial_cpb_removal_delay_length", scalar.UAdd(1))
d.FieldU5("cpb_removal_delay_length", scalar.UAdd(1))
d.FieldU5("dpb_output_delay_length", scalar.UAdd(1))
d.FieldU5("time_offset_length")
}
Binary file added doc/presentations/bts2022/avc_sps_hdr_params.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/presentations/bts2022/file.mp3
Binary file not shown.
Binary file added doc/presentations/bts2022/file.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,621 changes: 1,621 additions & 0 deletions doc/presentations/bts2022/formats.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/presentations/bts2022/fq-bts2022-v1.pdf
Binary file not shown.
238 changes: 238 additions & 0 deletions doc/presentations/bts2022/fq.slide
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
# fq

jq for binary formats

Mattias Wadman
[email protected]
https://github.com/wader/fq
@mwader

## Background

.html style.html

- Use various tools to extract data
- ffprobe, gm identify, mp4dump, mediainfo, wireshark, one off programs, ...
- Convert to usable format and do queries
- jq, grep, sqlite, sort, awk, sed, one off programs, ...
- Digging into and slicing binaries
- Hexfiend, hexdump, dd, cat, one off programs, ...


## Wishlist

"Want to see everything about this picture except the picture"

- A very verbose version of file(1)
- gdb for files
- Select and query things using a language
- Make all parts of a file symbolically addressable
- Nested formats and binaries
- Convenient bit-oriented decoder DSL


## Experiments and prototypes

- Decoder DSL
- TCL, lisp, tengo, Starlark, JavaScript, Go
- Query language
- JSONPath, SQL, jq, JavaScript
- How to use
- IR-JSON: `fq file | jq ... | fq`
- Extend existing project
- Decode and query in same tool


## Result

Go

- Tests showed fast enough to decode big files
- Found gojq
- Previous good experience
- Good tooling


## jq

"The JSON indenter"

- JSON in/out
- Syntax kind of a superset of JSON with same types
- Functional language based on generators and backtracking
- Expressions can return or "output" zero, one or more values
- No more outputs backtracks
- Implicit input and output similar to shell pipes
- Extraordinary iteration and combinatorial abilities
- Great for traversing tree structures


## Examples

.code jq1

## Examples

.code jq2

## Examples

.code jq3


## Examples

.code jq4


## fq

"The binary indenter"

- Superset of jq
- Re-implements most of jq's CLI interface
- 83 input formats, 22 supports probe
- Additional standard library functions
- Additional types that act as standard jq types but has special abilities
- _Decode value_ has bit range, actual and symbolic value, description, ...
- _Binary_ has a unit size, bit or bytes, and can be sliced
- Output fancy hexdump, JSON and binary
- Interactive REPL with completion and sub-REPL support


##

.image formats.svg _ 1024

## Usage

- Basic usage
- `fq . file`, `cat file | fq`
- Multiple input files
- `fq 'grep_by(format == "exif")' *.png *.jpeg`
- Hexdump, JSON and binary output
- `fq '.frames[10] | d' file.mp3`
- `fq '[grep_by(format == "dns").questions[].name.value]' file.pcap`
- `fq 'first(grep_by(format == "jpeg")) | tobytes' file > file.jpeg`
- Interactive REPL
- `fq -i . *.png`


##

.background data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAYAAABytg0kAAAAAXNSR0IArs4c6QAAAAlwSFlzAAAWJQAAFiUBSVIk8AAAABNJREFUCB1jZGBg+A/EDEwgAgQADigBA//q6GsAAAAASUVORK5CYII%3D
.image usage.svg _ 900


## fq specific functions

- Standard library
- `streaks`, `count`, `delta`, `chunk`, `diff`, `grep`, `grep_by`, ...
- `toradix`, `fromradix`, `hex`, `base64`, ...
- Decode value
- `display` (alias `d`, `dv`, `da` ...)
- `parent`, `format`, ...
- `tobytes`, `tovalue`, `toactual`, ...
- `torepr`, ...
- Binary
- Regexp functions `test`, `match`, ...
- Decode functions `probe`, `mp3_frame`, ...


## Binary and binary array

- A binary is created using `tobits`, `tobytes`, `tobitsrange` or `tobytesrange`.
- From decode value `.frames[1] | tobytes`
- String or number `"hello" | tobits`
- Binary array `[0xab, ["hello", .name]] | tobytes`
- Can be sliced using normal jq slice syntax.
- `"hello" | tobits[8:8+16]` are the bits for `"el"`
- Can be decoded
- `[tobytes[-10:], 0, 0, 0, 0] | flac_frame`


## Example queries

- Slice and decode
- `tobits[8:8+8000] | mp3_frame | d`
- `match([0xff,0xd8]) as $m | tobytes[$m.offset:] | jpeg`
- ASN1 BER, CBOR, msgpack, BSON, ... has `torepr` support
- `fq -d cbor torepr file.cbor`
- `fq -d msgpack '[torepr.items[].name]' file.msgpack`
- PCAP with TCP reassembly, look for GET requests
- `fq 'grep("GET .*")' file.pcap`
- Parent of scalar value that includes bit 100
- `grep_by(scalars and in_bits_range(100)) | parent`


## Use as script interpreter

.code fqscript


## Use as script interpreter

.code fqscriptout


## Implementation

- Library of jq function implemented in Go
- Decoders, decode value, binary, bit reader, IO, tty, ...
- CLI and REPL is mostly written in jq
```
( open
| decode
| if $repl then repeat(read as $expr | eval($expr) | print)
else eval($arg) | print
end
)
```
- All current decoders in Go
- Uses a forked version of gojq
- Helped add native functions and iterators support
- JQValue interface, bin/hex/oct literals, reflection, query AST functions, ...

## Decode API

SPS HRD parameters from ITU-T H.264 specification

.code avc_sps_hdr_params.go

## Decode API

.image avc_sps_hdr_params.png _ 900


## Decode API

Formats can use other formats. Simplified version of mp3 decoder:

.code mp3.go


## Future

- Declarative decoding like kaitai struct, decoder in jq
- Nicer way to handle checksums, encoding, validation etc
- Schemas for ASN1, protobuf, ...
- Better support for modifying data
- More formats like tls, http, http2, grpc, filesystems, ...
- Encoders
- More efficient, lazy decoding, smarter representation
- GUI
- Streaming input, read network traffic `tap("eth0") | select(...)`?
- Hope for more contributors


## Thanks and useful tools

- @itchyny for gojq
- Stephen Dolan and others for jq
- HexFiend
- GNU poke
- Kaitai struct
- Wireshark
- [vscode-jq](https://github.com/wader/vscode-jq)
- [jq-lsp](https://github.com/wader/jq-lsp)

22 changes: 22 additions & 0 deletions doc/presentations/bts2022/fqscript
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env fq -d mp4 -f

( first(.boxes[] | select(.type == "moov")?)
| first(.boxes[] | select(.type == "mvhd")?) as $mvhd
| { time_scale: $mvhd.time_scale,
duration: ($mvhd.duration / $mvhd.time_scale),
tracks:
[ .boxes[]
| select(.type == "trak")
| [("mdhd", "stsd", "elst") as $t | first(grep_by(.type == $t))] as [$mdhd, $stsd, $elst]
| { data_format: $stsd.boxes[0].type,
media_scale: $mdhd.time_scale,
edit_list:
[ $elst.entries[]
| { track_duration: (.segment_duration / $mvhd.time_scale),
media_time: (.media_time / $mdhd.time_scale)
}
]
}
]
}
)
23 changes: 23 additions & 0 deletions doc/presentations/bts2022/fqscriptout
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$ ./editlist file.mp4
{
"duration": 60.095,
"time_scale": 600,
"tracks": [
{
"data_format": "mp4a",
"edit_list": [
{
"media_time": 0,
"track_duration": 60.095
}
],
"media_scale": 22050
},
{
"data_format": "avc1",
"edit_list": [
{
"media_time": 0,
"track_duration": 60.095
}
...
Loading

0 comments on commit c298ed7

Please sign in to comment.