Skip to content

Conversation

ArnavBalyan
Copy link
Member

@ArnavBalyan ArnavBalyan commented Aug 25, 2025

  • Since Parquet 1.12, encryption has become a first class citizen, with support for footer and column level encryption.
  • However, users have no clear way to check encryption metadata, mode, or whether footer/file is encrypted.
  • This PR adds a simple, dedicated CLI command: parquet-cli encryption-info <file>
  • The command reports the following:
    • File-level encryption type: PLAINTEXT_FOOTER or ENCRYPTED_FOOTER.
    • Summary of column encryption, per-column details and their encryption status.

@ArnavBalyan ArnavBalyan changed the title Add encryption-info CLI support for Parquet file encryption metadata GH-3282: Add encryption info CLI support for Parquet file encryption metadata Aug 25, 2025
@ArnavBalyan
Copy link
Member Author

cc @shangxinli @gszadovszky could you please take a look thanks!

ParquetMetadata footer =
ParquetFileReader.readFooter(getConf(), qualifiedPath(source), ParquetMetadataConverter.NO_FILTER);

FileMetaData meta = footer.getFileMetaData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to also print out details about the encryption algorithm, wouldn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a great point will add support

@wgtmac
Copy link
Member

wgtmac commented Aug 28, 2025

cc @ggershinsky @shangxinli for experts on encryption

@ggershinsky
Copy link
Contributor

Some other details worth printing -

  • is a column encrypted with the footer key or with a column-specific key?
  • if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
  • explicit info on the footer encryption mode - encrypted or plaintext
  • optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
  • advanced debugging: print the AAD-related fields

@ArnavBalyan
Copy link
Member Author

Some other details worth printing -

  • is a column encrypted with the footer key or with a column-specific key?
  • if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
  • explicit info on the footer encryption mode - encrypted or plaintext
  • optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
  • advanced debugging: print the AAD-related fields

Thanks this is great feedback I'll iterate and update this shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants