Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize parquet footer reader #24007

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jinyangli34
Copy link
Contributor

Description

Improve efficiency of reading parquet footers with:

  1. Only loads row groups overlaps with offset/length. (if file_offset is set on parquet)
  2. Only load referenced columns.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 2, 2024
@github-actions github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Nov 2, 2024
@jinyangli34 jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch from d6fbf35 to 89d8941 Compare November 4, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector hudi Hudi connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

1 participant