Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added schema evolution to the merge statement #3135

Closed

Conversation

JustinRush80
Copy link

@JustinRush80 JustinRush80 commented Jan 16, 2025

Description

Add schema evolution (only merge) to the MERGE statement. New columns are added based on the columns predicates in the MERGE operations (eg. target.id = source.id). Using when_not_matched_insert_all and when_matched_update_all will add any new column to the target schema

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jan 16, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@JustinRush80 JustinRush80 changed the title feat: Added Schema Evolution to the Merge Statement feat: added schema evolution to the merge statement Jan 16, 2025
Rush and others added 26 commits January 16, 2025 00:23
Fixes a check so readerFeatures is enabled on version 3 or higher

Signed-off-by: Russell Jancewicz <[email protected]>
Signed-off-by: Rush <[email protected]>
Updates the requirements on [which](https://github.com/harryfei/which-rs) to permit the latest version.
- [Release notes](https://github.com/harryfei/which-rs/releases)
- [Changelog](https://github.com/harryfei/which-rs/blob/master/CHANGELOG.md)
- [Commits](harryfei/which-rs@6.0.0...7.0.0)

---
updated-dependencies:
- dependency-name: which
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Rush <[email protected]>
…y partitions

(cherry picked from commit af17bb2)
Signed-off-by: Alex Wilcoxson <[email protected]>

chore: fmt
Signed-off-by: Rush <[email protected]>
Updates the requirements on [thiserror](https://github.com/dtolnay/thiserror) to permit the latest version.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](dtolnay/thiserror@1.0.0...1.0.69)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Rush <[email protected]>
(cherry picked from commit 12abf00)
Signed-off-by: Alex Wilcoxson <[email protected]>
Signed-off-by: Rush <[email protected]>
small correction to z_order columns argument.

Signed-off-by: Rush <[email protected]>
Signed-off-by: Thomas Frederik Hoeck <[email protected]>
Signed-off-by: Rush <[email protected]>
`object_store` invokes `get_credential` on _every_ invocation of a
get/list/put/etc. The provider invocation for environment based
credentials is practically zero-cost, so this has no/low overhead.

In the case of the AssumeRoleProvider or any provider which has _some_
cost, such as an invocation of the AWS STS APIs, this can result in
rate-limiting or service quota exhaustion.

In order to prevent this, the credentials are attempted to be cached
only so long as they have no expired, which is defined in the
`aws_credential_types::Credential` struct

Signed-off-by: R. Tyler Croy <[email protected]>
Sponsored-by: Scribd Inc
Signed-off-by: Rush <[email protected]>
This is a fix aimed to enable jsonwriter to checkpoint in accordance
with delta.checkpointInterval.  It changes the default commitbuilder to
set a post_commit_hook so that checkpointing will be done by default.
Potentially we could also expose CommitProperties as an argument to
flush_and_commit, but that would require a change to the function
signature and would be a breaking change.

Signed-off-by: Justin Jossick <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: stretchadito <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Rush <[email protected]>
The release of pyo3 0.22.3 compells this since we cannot otherwise
compile. The choice is between pinning 0.22.2 and upgrading our ABI, and
I think it's better to upgrade the ABI

Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Rush <[email protected]>
Today the make_array function from Datafusion uses "item" as the list
element's field name. With recent changes in delta-kernel-rs we have
switched to calling it "element" which is more conventional related to
how Apache Parquet handles things

This change introduces a test which helps isolate the behavior seen in
Python tests within the core crate for easier regression testing

Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Rush <[email protected]>
Abdullahsab3 and others added 21 commits January 16, 2025 00:25
Signed-off-by: Abdullah Sabaa Allil <[email protected]>
Signed-off-by: Rush <[email protected]>
The Snapshot.files() functrion is public but cannot be possibly used
because the trait it relies upon isn't public. Oops!

Signed-off-by: R. Tyler Croy <[email protected]>
Sponsored-by: Scribd Inc
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Rush <[email protected]>
Signed-off-by: Francisco Garcia Florez <[email protected]>
Signed-off-by: Rush <[email protected]>
@JustinRush80 JustinRush80 force-pushed the feat/merge_schema_upsert branch from cd2e185 to d72d538 Compare January 16, 2025 05:29
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 16, 2025
Signed-off-by: Rush <[email protected]>
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 16, 2025
Copy link

codecov bot commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 92.62899% with 30 lines in your changes missing coverage. Please review.

Project coverage is 72.27%. Comparing base (af3102e) to head (49480f9).

Files with missing lines Patch % Lines
crates/core/src/operations/merge/mod.rs 94.48% 2 Missing and 20 partials ⚠️
python/src/merge.rs 0.00% 6 Missing ⚠️
python/src/lib.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3135      +/-   ##
==========================================
+ Coverage   72.07%   72.27%   +0.20%     
==========================================
  Files         134      134              
  Lines       43362    43759     +397     
  Branches    43362    43759     +397     
==========================================
+ Hits        31252    31628     +376     
- Misses      10087    10099      +12     
- Partials     2023     2032       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JustinRush80 JustinRush80 deleted the feat/merge_schema_upsert branch January 16, 2025 06:28
@JustinRush80 JustinRush80 restored the feat/merge_schema_upsert branch January 16, 2025 12:54
@JustinRush80 JustinRush80 deleted the feat/merge_schema_upsert branch January 16, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.