Skip to content

[Parquet] Support page level cache for reading #8246

@123789456ye

Description

@123789456ye

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Previously in parquet, we should read a whold RowGroup into memory and then extract what we need. This is obviously wasted.
Therefore, I thought of to only read the page we need, and cache the pages for future read.
The previous part is solved thanks to #7850 , and I begin to work after this pr released.

Describe the solution you'd like
I thought of adding a cache mechanism into decode_page in impl RowGroupReader for SerializedRowGroupReader. In this way we can avoid some decode and decompress cost.

Describe alternatives you've considered
I have considered to also add cache to filter stage, but this part is already implemented.
I have also considered about page level prefetch, but I think it may be not so profitable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions