Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v0.19.0: complete CDF support, add column operation, faster MERGE
Breaking changes!
Default writer engine has changed to rust. Replace your partition_filters with a predicate (sql) instead. PyArrow engine is deprecated now, and will be removed in v1.0.
Highlights
- CDF support in write_deltalake, delete, and merge operation
- Expired logs cleanup during post-commit. Can be disabled with
delta.enableExpiredLogCleanup = false
- Improved MERGE performance by using predicate non-partition columns min/max for prefiltering
ADD column
operation- Speed up log parsing
Performance improvements
- perf: apply projection when reading checkpoint parquet by @alexwilcoxson-rel in #2717
- perf: grab file size in rust by @ion-elgreco in #2734
- feat: improve merge performance by using predicate non-partition columns min/max for prefiltering by @JonasDev1 in #2513
- perf: early stop if all values in arr are null by @ion-elgreco in #2764
New features
- feat(python, rust): cdc write-support for
delete
operation by @ion-elgreco in #2721 - feat(python, rust): cdc write-support for
overwrite
andreplacewhere
writes by @ion-elgreco in #2722 - feat: introduce CDC generation for merge operations by @rtyler in #2747
- feat: use logical plan in delete, delta planner refactoring by @ion-elgreco in #2725
- feat: use logical plan in update, refactor/simplify CDCTracker by @ion-elgreco in #2727
- feat(python, rust): arrow large/view types passthrough, rust default engine by @ion-elgreco in #2738
- feat(python, rust): cleanup expired logs post-commit hook by @ion-elgreco in #2459
- feat(python, rust):
add column
operation by @ion-elgreco in #2562 - feat(python): handle PyCapsule interface objects in write_deltalake by @kylebarron in #2534
- feat(rust): fix size_in_bytes in last_checkpoint_ to i64 by @sherlockbeard in #2649
- feat(rust,python): cast each parquet file to delta schema by @HawaiianSpork in #2615
- feat: support userMetadata in CommitInfo by @jkylling in #2670
- feat(python, rust): add projection in CDF reads by @ion-elgreco in #2704
- feat(python): add DeltaTable.is_deltatable static method (#2662) by @omkar-foss in #2715
- feat: improved test fixtures by @roeap in #2749
- feat: fail fast on forked process by @Tom-Newton in #2765
- feat: restore the TryFrom for DeltaTablePartition by @rtyler in #2767
- feat: more economic data skipping with datafusion by @roeap in #2772
Bug Fixes
- fix(rust): inconsistent order of partitioning columns (#2494) by @aditanase in #2614
- fix(rust,python): checkpoint with column nullable false by @sherlockbeard in #2680
- fix: update delta kernel version by @jeppe742 in #2685
- fix(python): empty dataset fix for "pyarrow" engine by @sherlockbeard in #2689
- fix: ensure DataFusion SessionState Parquet options are applied to DeltaScan by @alexwilcoxson-rel in #2702
- fix(python, rust): use url encoder when encoding partition values by @ion-elgreco in #2705
- fix(python, rust): use input schema to get correct schema in cdf reads by @ion-elgreco in #2723
- fix: change arrow map root name to follow with parquet root name by @sclmn in #2538
- fix: schema adapter doesn't map partial batches correctly by @alexwilcoxson-rel in #2735
- fix: optimize Spark written tables by @rtyler in #1650
- fix(python, rust): cdc in writer not creating inserts by @ion-elgreco in #2751
- fix(python, rust): don't flatten fields during cdf read by @ion-elgreco in #2763
- fix: column parsing to include nested columns and enclosing char by @gtrawinski in #2737
Other Changes
- chore: missed one macos runner reference in actions by @rtyler in #2645
- chore: add a reproduction case for merge failures with struct by @rtyler in #2644
- ci: update CODEOWNERS by @hntd187 in #2650
- chore: increase subcrate versions by @rtyler in #2648
- docs: fix bullets on hdfs docs by @Kimahriman in #2653
- docs: improve navigation fixes by @avriiil in #2660
- docs: add integration docs for s3 backend by @avriiil in #2658
- chore: bump ruff to 0.5.2 by @fpgmaas in #2673
- chore: enable
RUF
ruleset forruff
by @fpgmaas in #2677 - chore: pin
ruff
andmypy
versions in thelint
stage in the CI pipeline by @fpgmaas in #2679 - chore: update README.md by @veronewra in #2684
- chore: create separate action to setup python and rust in the cicd pipeline by @fpgmaas in #2687
- chore: add test coverage command to
Makefile
by @fpgmaas in #2688 - chore: improve contributing.md by @fpgmaas in #2672
- chore: remove stale code for conditional import of
Literal
by @fpgmaas in #2676 - chore: remove references to black from the project by @fpgmaas in #2674
- chore: refactor
write_deltalake
inwriter.py
by @fpgmaas in #2695 - chore: upgrade to datafusion 40 by @rtyler in #2661
- chore: prepare python release 0.18.3 by @ion-elgreco in #2707
- chore: enabling actions for merge groups by @rtyler in #2718
- chore(deps): update sqlparser requirement from 0.47 to 0.49 by @dependabot in #2714
- chore: try an alternative docke compose invocation syntax by @rtyler in #2724
- chore(deps): update which requirement from 4 to 6 by @dependabot in #2730
- chore: update changelog and versions for next release by @rtyler in #2740
- chore: add to code_owner crates by @ion-elgreco in #2741
- chore: update delta_kernel to 0.3.0 by @alexwilcoxson-rel in #2742
- docs: fix broken link in docs by @astrojuanlu in #2746
- chore: upgrade to datafusion 41 by @rtyler in #2761
- chore: prepare the next notable release of 0.19.0 by @rtyler in #2768
- chore: fix a bunch of clippy lints and re-enable tests by @rtyler in #2773
New Contributors
- @aditanase made their first contribution in #2614
- @fpgmaas made their first contribution in #2673
- @kylebarron made their first contribution in #2534
- @veronewra made their first contribution in #2684
- @jeppe742 made their first contribution in #2685
- @sclmn made their first contribution in #2538
- @astrojuanlu made their first contribution in #2746
- @gtrawinski made their first contribution in #2737
Full Changelog: python-v0.18.2...python-v0.19.0
python-v0.18.2: HDFS support
New features
- feat(#2597): allow pyarrow.dataset.Expression in filters kwarg by @giacomorebecchi in #2600
- feat(rust, python): add HDFS support via hdfs-native package by @Kimahriman in #2612
- feat: report DataFusion metrics for DeltaScan by @alexwilcoxson-rel in #2617
Bug Fixes
- fix: enable parquet pushdown for DeltaScan via TableProvider impl for DeltaTable (rebase) by @rtyler in #2637
- fix(rust, python): fix writing empty structs when creating checkpoint by @sherlockbeard in #2627
- fix(python): fixed large_dtype to schema convert by @sherlockbeard in #2635
- fix(rust, python): fix merge schema with overwrite by @sherlockbeard in #2623
- fix(python): constrain multipart upload size to fixed length by @abhiaagarwal in #2606
- fix: update changelog by @rtyler in #2599
Other Changes
- chore: migrate to pyo3 Bounds API by @abhiaagarwal in #2596
- chore(deps): update dashmap requirement from 5 to 6 by @dependabot in #2641
- chore: remove macos builders from pull request flow by @rtyler in #2638
- docs: add Daft writer by @avriiil in #2594
- chore: fix documentation generation with a pin of griffe by @rtyler in #2636
- chore: bump python 0.18.2 by @ion-elgreco in #2621
- chore: implement regression test for push down panic by @rtyler in #2604
- docs: fix typo by @avriiil in #2603
- test: reintroduce azurite SAS integration tests by @giacomorebecchi in #2598
New Contributors
- @giacomorebecchi made their first contribution in #2598
- @Kimahriman made their first contribution in #2612
- @sherlockbeard made their first contribution in #2623
Full Changelog: python-v0.18.1...python-v0.18.2
python-v0.18.1
New features
- feat: add custom dynamodb endpoint configuration by @hnaoto in #2575
- chore: bump to datafusion 39, arrow 52, pyo3 0.21 by @abhiaagarwal in #2581
Bug Fixes
- chore: bump macOS runners, maybe resolve import error by @ion-elgreco in #2588
Other Changes
- docs: improve S3 access docs by @avriiil in #2589
- chore: expose
files_by_partition
to public api by @edmondop in #2533
New Contributors
- @abhiaagarwal made their first contribution in #2581
- @edmondop made their first contribution in #2533
Full Changelog: python-v0.18.0...python-v0.18.1
python-v0.18.0: CDC for update operation, added `set table properties` operation
New features
- feat: adopt kernel schema types by @roeap in #2495
- feat: add stats to convert-to-delta operation by @gruuya in #2491
- feat(python, rust): add
set table properties
operation by @ion-elgreco in #2264 - feat: implement transaction identifiers - continued by @roeap in #2539
- feat: introduce CDC write-side support for the Update operations by @rtyler in #2486
Bug Fixes
- fix(rust, python): fixed differences in storage options between log and object stores by @mightyshazam in #2500
- fix: enable field_with_name to support nested fields with '.' delimiter by @alexwilcoxson-rel in #2519
- fix(python): release GIL on most operations by @adriangb in #2512
- fix: clippy warnings by @imor in #2548
- fix: remove deprecated overwrite_schema configuration which has incorrect behavior by @rtyler in #2554
- fix: update deltalake crate examples for crate layout and TimestampNtz by @jhoekx in #2559
- fix: consistently use raise_if_key_not_exists in CreateBuilder by @vegarsti in #2569
- fix: cast support fields nested in lists and maps by @HawaiianSpork in #2541
Other Changes
- docs: fix typo by @avriiil in #2508
- chore: tidying up builds without datafusion feature and clippy by @rtyler in #2516
- chore: fixing some clips by @rtyler in #2521
- fix: msrv in workspace by @roeap in #2524
- feat(rust): make PartitionWriter public by @adriangb in #2525
- docs: improve daft integration docs by @avriiil in #2496
- chore: bump python 0.17.5 by @ion-elgreco in #2531
- chore(deps): update itertools requirement from 0.12 to 0.13 by @dependabot in #2526
- docs: dask write syntax fix by @avriiil in #2543
- docs: pull delta from conda not pip by @avriiil in #2535
- docs: clarify locking mechanism requirement for S3 by @inigohidalgo in #2558
- chore(deps): update sqlparser requirement from 0.46 to 0.47 by @dependabot in #2563
- docs: dt.delete add context + api docs link by @avriiil in #2560
New Contributors
- @imor made their first contribution in #2548
- @inigohidalgo made their first contribution in #2558
- @vegarsti made their first contribution in #2565
- @HawaiianSpork made their first contribution in #2541
Full Changelog: python-v0.17.4...python-v0.18.0
python-v0.17.4: stats collection according config
New features
- feat(python): add parameter to DeltaTable.to_pyarrow_dataset() by @adriangb in #2465
- feat(python, rust): respect column stats collection configurations by @ion-elgreco in #2428
Bug Fixes
- fix(rust): implement abort commit for S3DynamoDBLogStore by @PeterKeDer in #2452
- fix(python, rust): use new schema for stats parsing instead of old by @ion-elgreco in #2480
- fix: check to see if the file exists before attempting to rename by @rtyler in #2482
- fix(rust): unable to read delta table when table contains both null and non-null add stats by @yjshen in #2476
- fix(python, rust): region lookup wasn't working correctly for dynamo by @mightyshazam in #2488
- fix: return unsupported error for merging schemas in the presence of partition columns by @emcake in #2469
- fix(python): reuse state in
to_pyarrow_dataset
by @ion-elgreco in #2485
Other Changes
- chore(deps): update sqlparser requirement from 0.44 to 0.46 by @dependabot in #2483
- test: add test for concurrent checkpoint during table load by @alexwilcoxson-rel in #2151
Full Changelog: python-v0.17.3...python-v0.17.4
python-v0.17.3: CDF read support
New features
- feat(rust): advance state in post commit by @ion-elgreco in #2396
- feat: cdf reader for delta tables by @hntd187 in #2048
- feat(python, rust): add OBJECT_STORE_CONCURRENCY_LIMIT setting for ObjectStoreFactory by @zZKato in #2458
Bug Fixes
Other changes
- chore(rust): bump arrow v51 and datafusion v37.1 by @lasantosr in #2395
New Contributors
Full Changelog: python-v0.17.2...python-v0.17.3
rust-v0.17.3
rust-v0.17.3 (2024-05-01)
Implemented enhancements:
- Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
- How to get a DataFrame in Rust? #2404
- Allow checkpoint creation when partion column is "timestampNtz " #2381
- is there a way to make writing timestamp_ntz optional #2339
- Update arrow dependency #2328
- Release GIL in deltalake.write_deltalake #2234
- Unable to retrieve custom metadata from tables in rust #2153
- Refactor commit interface to be a Builder #2131
Fixed bugs:
- Handle rate limiting during write contention #2451
- regression : delta.logRetentionDuration don't seems to be respected #2447
- Issue writing to mounted storage in AKS using delta-rs library #2445
- TableMerger - when_matched_delete() fails when Column names contain special characters #2438
- Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
- Merge on predicate throw error on date colum: Unable to convert expression to string #2420
- Writing Tables with Append mode errors if the schema metadata is different #2419
- Logstore issues on AWS Lambda #2410
- Datafusion timestamp type doesn't respect delta lake schema #2408
- Compacting produces smaller row groups than expected #2386
- ValueError: Partition value cannot be parsed from string. #2380
- Very slow s3 connection after 0.16.1 #2377
- Merge update+insert truncates a delta table if the table is big enough #2362
- Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
- Create empty table failed on rust engine #2354
- Getting error message when running in lambda: message: "Too many open files" #2353
- Temporary files filling up _delta_log folder - increasing table load time #2351
- compact fails with merged schemas #2347
- Cannot merge into table partitioned by date type column on 0.16.3 #2344
- Merge breaks using logical datatype decimal128 #2343
- Decimal types are not checked against max precision/scale at table creation #2331
- Merge update+insert truncates a delta table #2320
- Extract
add.stats_parsed
with wrong type #2312 - Process fails without error message when executing merge #2310
- delta_rs don't seems to respect the row group size #2309
- Auth error when running inside VS Code #2306
- Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
- Schema evolution not coercing with Large arrow types #2298
- Panic in
deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}}
#2290 - Checkpoint does not preserve reader and writer features for the table protocol. #2288
- Z-Order with larger dataset resulting in memory error #2284
- Successful writes return error when using concurrent writers #2279
- Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
- Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
- DeltaTable is not resilient to corrupted checkpoint state #2258
- Inconsistent units of time #2256
- Partition column comparison is an assertion rather than if block with raise exception #2242
- Unable to merge column names starting from numbers #2230
- Merging to a table with multiple distinct partitions in parallel fails #2227
- cleanup_metadata not respecting custom
logRetentionDuration
#2180 - Merge predicate fails with a field with a space #2167
- When_matched_update causes records to be lost with explicit predicate #2158
- Merge execution time grows exponetially with the number of column #2107
- _internal.DeltaError when merging #2084
python-v0.17.2
What's Changed
- chore: introduce the Operation trait to enforce consistency between operations by @rtyler in #2435
- fix(python): reuse table state in write engine by @ion-elgreco in #2453
Full Changelog: python-v0.17.1...python-v0.17.2
python-v0.17.1
Bug Fixes
- fix(python, rust): use from_name during column projection creation by @ion-elgreco in #2441
- fix(python, rust): check timestamp_ntz in nested fields, add check_can_write in pyarrow writer by @ion-elgreco in #2443
- fix(python, rust): remove imds calls from profile auth and region by @mightyshazam in #2442
Full Changelog: python-v0.17.0...python-v0.17.1
python-v0.17.0: checkpoint hook
New features
- feat(rust): post commit hook (v2), create checkpoint hook by @ion-elgreco in #2391
- feat: added configuration variables to handle EC2 metadata service by @mightyshazam in #2385
- feat: lazy static runtime in python by @ion-elgreco in #2424
- feat: implement repartitioned for DeltaScan by @jkylling in #2421
Bug Fixes
- fix(python, rust): expr parsing date/timestamp by @ion-elgreco in #2357
- fix(rust): remove flush after writing every batch by @PeterKeDer in #2387
- fix: return error when checkpoints and metadata get out of sync by @esarili in #2406
- fix: time travel when checkpointed and logs removed by @ion-elgreco in #2389
- fix(rust): timestamp deserialization format, missing type by @ion-elgreco in #2383
- fix(rust): stats_parsed has different number of records with stats by @yjshen in #2405
- fix(python): load_as_version with datetime object with no timezone specified by @t1g0rz in #2429
- fix(python,rust): missing remove actions during create_or_replace specified by @ion-elgreco in #2437
Other Changes
- chore: bump chrono by @universalmind303 in #2372
- docs: document required aws permissions by @ale-rinaldi in #2393
- docs: add Daft integration by @avriiil in #2402
New Contributors
- @PeterKeDer made their first contribution in #2387
- @ale-rinaldi made their first contribution in #2393
- @esarili made their first contribution in #2406
- @jkylling made their first contribution in #2421
- @t1g0rz made their first contribution in #2429
Full Changelog: python-v0.16.4...python-v0.17.0