Skip to content

Commit 9314a0b

Browse files
committed
Incorporate major PDB table row fix
1 parent 38f9601 commit 9314a0b

File tree

3 files changed

+33
-57
lines changed

3 files changed

+33
-57
lines changed

CHANGELOG.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,15 @@ This change log follows the conventions of
66

77
## [Unreleased][unreleased]
88

9-
### Changed
9+
### Fixed
1010

11-
- Reworked the database export Kaitai Struct definition to incorporate some important discoveries by the Mixxx and rekordcrate developers (thanks once again [@Swiftb0y](https://github.com/Swiftb0y)): all tables that use string offsets have subtypes which control whether those offsets are eight or sixteen bits.
11+
- Reworked the database export Kaitai Struct definition to incorporate some important discoveries by the [Mixxx](https://mixxx.org) and [rekordcrate](https://github.com/Holzhaus/rekordcrate) developers (thanks once again [@Swiftb0y](https://github.com/Swiftb0y)): all tables that use string offsets have subtypes which control whether those offsets are eight or sixteen bits.
1212
We had previously only noted that for the Artists table, but the Album, Tag, and Tag Track tables behave this way as well, and in fact even the Track table follows this pattern, but its offsets always use the sixteen bit variant because the rows are so big.
13+
- The understanding of row counts in DeviceSQL data pages has been broken since the very beginning (though we had some clumsy workarounds that were good enough for reading).
14+
Thanks to [Robin McCorkell](https://github.com/RobinMcCorkell) we now can properly interpret these non-byte-aligned numbers.
15+
16+
### Changed
17+
1318
- Updated the analysis file Kaitai Struct definition to cope with the fact that rekordbox sometimes puts truly bizarre values (we have seen `f3` and `f9`) in the track bank byte of song structure tags.
1419
Previously this cause construction of the entire tag object to fail because no matching enumeration value could be found.
1520
Now we have a separate `raw_bank` field that holds the numeric value, and `bank` is a value instance that can be `null` when `raw_bank` is not recognizable.
@@ -36,7 +41,7 @@ May the Fourth be with you.
3641
- Upgraded Kaitai Struct to version 0.10, which includes a number of
3742
fixes and adds linting of mapped values.
3843
> :wrench: This is a backwards-incompatible change.
39-
- Since we are already backwards incompatible with pervious releases,
44+
- Since we are already backwards incompatible with previous releases,
4045
changed some mapped value names to correspond to
4146
the KSY style guide and fix linter errors reported by KSC 0.10:
4247
1. In `rekordbox_pdb.ksy` renamed `num_groups` to `num_row_groups`.

doc/modules/ROOT/pages/exports.adoc

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -151,14 +151,13 @@ These each have the size specified by __len_page__ in the above diagram, and the
151151
(draw-box (text "next_page" :math) {:span 4})
152152
(draw-box (text "version" :math) {:span 4})
153153
(draw-box (text "unknown" :math [:sub "2"]) {:span 4})
154-
(draw-box (text "n" :math [:sub "rs"]))
155-
(draw-box (text "num" :math [:sub "rv"]) {:span 2})
154+
(draw-box (text "row counts") {:span 3})
156155
(draw-box (text "p" :math [:sub "f"]))
157156
(draw-box (text "free" :math [:sub "s"]) {:span 2})
158157
(draw-box (text "used" :math [:sub "s"]) {:span 2})
159158
160159
(draw-box (text "u" :math [:sub "5"]) {:span 2})
161-
(draw-box (text "num" :math [:sub "rl"]) {:span 2})
160+
(draw-box (text "unk" :math [:sub "rows"]) {:span 2})
162161
(draw-box (text "u" :math [:sub "6"]) {:span 2})
163162
(draw-box (text "u" :math [:sub "7"]) {:span 2})
164163
(draw-gap "heap")
@@ -196,20 +195,11 @@ This might be used by the history feature, or might be some kind of data integri
196195

197196
We don't know what _unknown~2~_ contains, though it is often/always zero. It may be an extension of _version_ to 64-bits?
198197

199-
Next, __num_rows_small__ (abbreviated _n~rs~_ in the byte field diagram above) holds the number of row offsets (valid or not) that are present in the page, unless __num_rows_large__ (below) holds a value that is larger than it (but not equal to `1fff`). To find the actual rows, you need to scan all 16 entries of each of the row groups present in the page, ignoring any whose <<#row-presence-bits,row presence bit>> is zero.
200-
This seems like a strange mechanism for dealing with the fact that some tables (like playlist entries) have a lot of very small rows, too many to count with a single byte.
201-
But then why not just always use __num_rows_large__?
202-
203-
__num_rows_valid__ (abbreviated _num~rv~_) holds the number of valid rows multiplied by 0x20 (i.e. the first 5 bits are not part of the counter). This count does reflect invalidated rows unlike __num_rows_small__.
204-
205-
.Example row counts
206-
[example]
207-
* Fresh page containing 1 row: _num_rows_small_=1, _num_rows_large_=0, _num_rows_valid_=32
208-
* Page containing 2 rows: _num_rows_small_=2, _num_rows_large_=1, _num_rows_valid_=64
209-
* Page containing 2 valid rows and 1 deleted row: _num_rows_small_=3, _num_rows_large_=1, _num_rows_valid_=64
210-
* Page containing 6 valid rows and 2 deleted rows: _num_rows_small_=8, _num_rows_large_=6, _num_rows_valid_=192
211-
* Page containing 8 valid rows: _num_rows_small_=8, _num_rows_large_=7, _num_rows_valid=256
212-
* Page containing 7 valid rows and some deleted rows: _num_rows_small_=10, _num_rows_large_=0x1fff, _num_rows_valid_=224
198+
The three bytes{nbsp}``8``–``a``, labeled “row counts”, actually contain two non-byte-aligned numbers.
199+
The first 13 bits, __num_row_offsets__ keeps track of how many row offsets have ever been allocated in the heap, even if some of them are no longer valid.
200+
This number only ever increases over time, and can be used to calculate how many row groups are present in the page.
201+
The final 11 bits, __num_rows__, report the number of valid rows that are currently present in the table.footnote:[It is unclear why these two values are packed into three bytes like this.
202+
We did not understand this structure, and had to rely on clumsy workarounds, until https://github.com/RobinMcCorkell[Robin McCorkell] figured this out in December 2025.]
213203

214204
Byte{nbsp}``1b`` is called __page_flags__ (abbreviated _p~f~_ in the diagram).
215205
According to Mr. Lesniak, “strange” (non-data) pages will have the value `44` or `64`, and other pages have had the values `24` or `34`.
@@ -219,7 +209,7 @@ Bytes{nbsp}``1c``-`1d` are called __free_size__ (abbreviated _free~s~_ in the di
219209

220210
Bytes{nbsp}``20``-`21`, _u~5~_ , are of unclear purpose. Mr. Lesniak labeled them “(0→1: 2).”
221211

222-
Bytes{nbsp}``22``-`23`, __num_rows_large__ (abbreviated _num~rl~_ in the diagram) hold the number of entries in the row index at the end of the page when that value is too large to fit into __num_rows_small__ (as mentioned above), and that situation seems to be indicated when this value is larger than __num_rows_small__, but not equal to `1fff`.
212+
Bytes{nbsp}``22``-`23`, labeled __unk~rows~__, hold a value that seems related to the number of rows in the table in an unclear way, but sometimes instead equals `1fff`.
223213

224214
_u~6~_ at bytes{nbsp}``24``-`25` seems to have the value `1004` for strange pages, and `0000` for data pages.
225215
And Mr. Lesniak describes _u~7~_ at bytes{nbsp}``26``-`27` as “always 0 except 1 for history pages, num entries for strange pages?”

src/main/kaitai/rekordbox_pdb.ksy

Lines changed: 17 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,8 @@ types:
163163
heap in which row data is found. At the end of the page there is
164164
an index which locates all rows present in the heap via their
165165
offsets past the end of the page header.
166+
meta:
167+
bit-endian: le
166168
seq:
167169
- id: gap
168170
contents: [0, 0, 0, 0]
@@ -194,22 +196,14 @@ types:
194196
doc: |
195197
@flesniak said: "sequence number (0->1: 8->13, 1->2: 22, 2->3: 27)"
196198
- size: 4
197-
- id: num_rows_small
198-
type: u1
199-
doc: |
200-
Holds the value used for `num_rows` (see below) unless
201-
`num_rows_large` is larger (but not equal to `0x1fff`). This
202-
seems like some strange mechanism to deal with the fact that
203-
lots of tiny entries, such as are found in the
204-
`playlist_entries` table, are too big to count with a single
205-
byte. But why not just always use `num_rows_large`, then?
206-
- type: u1
207-
doc: |
208-
@flesniak said: "a bitmask (1st track: 32)"
209-
- type: u1
210-
doc: |
211-
@flesniak said: "often 0, sometimes larger, esp. for pages
212-
with high real_entry_count (e.g. 12 for 101 entries)"
199+
- id: num_row_offsets
200+
type: b13
201+
doc: |
202+
Seems to hold the number of row offsets that have ever been
203+
allocated, including those that are no longer valid.
204+
- id: num_rows
205+
type: b11
206+
doc: The number of valid rows currently present in the page.
213207
- id: page_flags
214208
type: u1
215209
doc: |
@@ -226,17 +220,13 @@ types:
226220
- type: u2
227221
doc: |
228222
@flesniak said: "(0->1: 2)"
229-
- id: num_rows_large
223+
- id: unk_rows
230224
type: u2
231225
doc: |
232-
Holds the value used for `num_rows` (as described above)
233-
when that is too large to fit into `num_rows_small`, and
234-
that situation seems to be indicated when this value is
235-
larger than `num_rows_small`, but not equal to `0x1fff`.
236-
This seems like some strange mechanism to deal with the fact
237-
that lots of tiny entries, such as are found in the
238-
`playlist_entries` table, are too big to count with a single
239-
byte. But why not just always use this value, then?
226+
This was previously believed to take the place of a one-byte `num_rows`
227+
field, when that got too large to fit, but that was based on an incorrect
228+
understanding of the `num_row_offsets` and `num_rows` bit fields. We do
229+
not yet have a new explanation for the purpose of this value.
240230
- type: u2
241231
doc: |
242232
@flesniak said: "1004 for strange blocks, 0 otherwise"
@@ -246,24 +236,15 @@ types:
246236
entries for strange pages?"
247237
- id: heap
248238
size-eos: true
249-
if: false # never true, but stores pos
239+
if: 'false' # never true, but stores pos
250240
instances:
251241
is_data_page:
252242
value: page_flags & 0x40 == 0
253243
-webide-parse-mode: eager
254244
heap_pos:
255245
value: _io.pos
256-
num_rows:
257-
value: |
258-
(num_rows_large > num_rows_small) and (num_rows_large != 0x1fff) ? num_rows_large : num_rows_small
259-
doc: |
260-
The number of rows that have ever been allocated on this
261-
page (controls the number of row groups there are, but some
262-
entries in each group may not be marked as present in the
263-
table due to deletion or updates).
264-
-webide-parse-mode: eager
265246
num_row_groups:
266-
value: '(num_rows - 1) / 16 + 1'
247+
value: '(num_row_offsets - 1) / 16 + 1'
267248
doc: |
268249
The number of row groups that are present in the index. Each
269250
group can hold up to sixteen rows, but `row_present_flags`

0 commit comments

Comments
 (0)