Skip to content

Commit 63b0cc7

Browse files
committed
doc(dwarfs-format): clarify docs based on #263
1 parent 7508b1d commit 63b0cc7

File tree

1 file changed

+36
-10
lines changed

1 file changed

+36
-10
lines changed

doc/dwarfs-format.md

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,10 @@ This document describes the DwarFS file system format, version 2.5.
77
## FILE STRUCTURE
88

99
A DwarFS file system image is just a sequence of blocks, optionally
10-
prefixed by a "header", which is typically some sort of shell script.
11-
Each block has the following format:
10+
prefixed by a "header", which is typically some sort of shell script
11+
or other executable that intends to use the "bundled" DwarFS image.
12+
13+
Each block in the DwarFS image has the following format:
1214

1315
┌───┬───┬───┬───┬───┬───┬───┬───┐
1416
0x00 │'D'│'W'│'A'│'R'│'F'│'S'│MAJ│MIN│ MAJ=0x02, MIN=0x05 for v2.5
@@ -61,8 +63,9 @@ A couple of notes:
6163

6264
- A minor version number change will be backwards compatible, i.e. an
6365
old program will refuse to read a file system with a minor version
64-
larger than the one it supports. However, a new program will still
65-
read all file systems with a smaller minor version number.
66+
larger than the one it supports. However, a new program can still
67+
read all file systems with a smaller minor version number, although
68+
very old versions may at some point no longer be supported.
6669

6770
### Header Detection
6871

@@ -81,21 +84,32 @@ without any problems.
8184

8285
### Section Types
8386

84-
There are currently 4 different section types.
87+
Currently, the following different section types are defined:
8588

8689
- `BLOCK` (0):
8790
A block of data. This is where all file data is stored. There can be
88-
an arbitrary number of blocks of this type.
91+
an arbitrary number of blocks of this type. The file data can only be
92+
interpreted using the metadata blocks. The metadata contains a list
93+
of chunks for each file, each of which references a small part of the
94+
data in a single `BLOCK`.
8995

9096
- `METADATA_V2_SCHEMA` (7):
91-
The schema used to layout the `METADATA_V2` block contents. This is
92-
stored in "compact" thrift encoding.
97+
The [schema](https://github.com/facebook/fbthrift/blob/main/thrift/lib/thrift/frozen.thrift)
98+
used to layout the `METADATA_V2` block contents. This is stored in
99+
"compact" thrift encoding. The metadata cannot be read without the
100+
schema, as it defines the exact bit widths used to store each metadata
101+
field.
93102

94103
- `METADATA_V2` (8):
95104
This section contains the bulk of the metadata. It's essentially just
96105
a collection of bit-packed arrays and structures. The exact layout of
97106
each list and structure depends on the actual data and is stored
98-
separately in `METADATA_V2_SCHEMA`.
107+
separately in `METADATA_V2_SCHEMA`. The metadata format is defined in
108+
[metadata.thrift](../thrift/metadata.thrift) and the binary format that
109+
derives from that definition uses
110+
[Frozen2](https://github.com/facebook/fbthrift/blob/main/thrift/lib/cpp2/frozen/Frozen.h).
111+
Frozen2 is not only extremely space efficient, it also allows accessing
112+
huge data structures directly through memory-mapping.
99113

100114
- `SECTION_INDEX` (9):
101115
The section index is, well, an index of all sections in the file
@@ -117,7 +131,19 @@ There are currently 4 different section types.
117131
- `HISTORY` (10):
118132
File system history information as defined `thrift/history.thrift`.
119133
This is stored in "compact" thrift encoding. Zero or more history
120-
sections are supported.
134+
sections are supported. This section type is purely informational
135+
and not needed to read the DwarFS image.
136+
137+
### Compression Algorithms
138+
139+
DwarFS supports a wide range of block compression algorithms, some of
140+
which require additional metadata. The full list of supported algorithms
141+
is defined in [`dwarfs/compression.h`](../include/dwarfs/compression.h).
142+
143+
For compression algorithms with metadata, the metadata is defined in
144+
[`thrift/compression.thrift`](../thrift/compression.thrift). The metadata
145+
is stored in "compact" thrift encoding at the beginning of the block, just
146+
after the header.
121147

122148
## METADATA FORMAT
123149

0 commit comments

Comments
 (0)