You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When incrementally constructing a compact/ordinary index and the max chunk size is exceeded, then currently we might split up the output into multiple chunks if the serialised form of the index has a size that is mutiple of the max chunk size. It might be worth it to just return a larger chunk in that case (maybe still a multiple of the max chunk size), since we don't really rely on chunks being a specific size anywhere in the RunBuilder/RunAcc code
The text was updated successfully, but these errors were encountered:
jorisdral
changed the title
Do not split up incremental serialisation output into multiple Chunks
Allow Chunks that are large than the max chunk size in index construction
Jul 21, 2024
jeltsch
changed the title
Allow Chunks that are large than the max chunk size in index construction
Allow chunks that are larger than the maximum chunk size in index construction
Jul 30, 2024
As we agreed in our project meeting, this seems to be the way to go indeed. Concretely, we concluded in the meeting that, whenever the size of the buffered serialized data exceeds a certain threshold, all available data should be output in form of a single chunk. This particularly means the following:
Chunk sizes don’t have to be multiples of the threshold.
The threshold is not a maximum chunk size anymore but rather a minimum chunk size.
There is no maximum chunk size.
The last point is justified, because in practice chunks will still not become so large that the work of writing the index isn’t appropriately spread over time, which is particularly because serialized keys aren’t expected to be large.
With this new approach, the output of appending to an index shouldn’t have type [Chunk] anymore but rather type Maybe Chunk, as there can be at most one chunk only.
I will implement this new approach of chunk generation already as part of #296 and #299. For the compact index, it remains to be implemented (potentially by using the general-purpose chunk handling to be added by #296).
When incrementally constructing a compact/ordinary index and the max chunk size is exceeded, then currently we might split up the output into multiple chunks if the serialised form of the index has a size that is mutiple of the max chunk size. It might be worth it to just return a larger chunk in that case (maybe still a multiple of the max chunk size), since we don't really rely on chunks being a specific size anywhere in the
RunBuilder
/RunAcc
codeThe text was updated successfully, but these errors were encountered: