Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parquet][C++] PageIndex is useless with current API #45284

Open
mpoeter opened this issue Jan 16, 2025 · 1 comment
Open

[Parquet][C++] PageIndex is useless with current API #45284

mpoeter opened this issue Jan 16, 2025 · 1 comment

Comments

@mpoeter
Copy link

mpoeter commented Jan 16, 2025

Describe the enhancement requested

The ParquetFileReader provides a PageIndexReader via which we can eventually get to a ColumnIndex and an OffsetIndex - so far so good. Those indexes provide page based information, but in virtually all APIs the concept of pages is completely abstracted away. For higher level APIs that makes sense, but even if we go down to the level of the PageReader we can only read all pages serially one after the other. The only way I found to skip some pages is via the PageReader's data page filter, but that only operates on the page's metadata and does not utilize the index. I did not find a way to load a specific page (e.g.,via index or file offset). But then I don't see how one can utilize the PageIndex with the current API. Did I miss anything?

Component(s)

C++, Parquet

@wgtmac
Copy link
Member

wgtmac commented Jan 16, 2025

You're right. There has been a discussion to introduce RowRanges API and leverage the page index to skip pages: https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gVzjkgM. There is a stale PR but the author no longer works on it. Recently I started to pick it up by implementing the RowRanges API and it will take some time to finish: #45234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants