Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Implement RewriteTablePath #11555

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

szehon-ho
Copy link
Collaborator

This is the implementation for #10920 (an action to prepare metadata for an Iceberg table for DR copy)

This has been used in production for awhile in our setup, although support for rewrite of V2 position delete is new. I performed the following cleanups while contributing it.

  • Made RewriteTableSparkAction code more functional (avoid using member variable on the action to track state)
  • Moved some RewriteTableSparkAction code to core Utll classes to avoid having to make some classes public as was previously done.

* Path to a comma-separated list of source and target paths for all files added to the table
* between startVersion and endVersion, including original data files and metadata files
* rewritten to staging.
* Result file list location. This file contains a 'copy plan', a comma-separated list of all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels a little ambiguous.

Maybe?

A file containing a listing of both original file names and file names under the new prefix, comma separated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants