Skip to content

Releases: delta-io/delta-rs

python-v0.19.0: complete CDF support, add column operation, faster MERGE

14 Aug 22:14
Compare
Choose a tag to compare

Breaking changes!

Default writer engine has changed to rust. Replace your partition_filters with a predicate (sql) instead. PyArrow engine is deprecated now, and will be removed in v1.0.

Highlights

  • CDF support in write_deltalake, delete, and merge operation
  • Expired logs cleanup during post-commit. Can be disabled with delta.enableExpiredLogCleanup = false
  • Improved MERGE performance by using predicate non-partition columns min/max for prefiltering
  • ADD column operation
  • Speed up log parsing

Performance improvements

New features

Bug Fixes

Other Changes

New Contributors

Full Changelog: python-v0.18.2...python-v0.19.0

python-v0.18.2: HDFS support

03 Jul 06:31
Compare
Choose a tag to compare

New features

Bug Fixes

Other Changes

New Contributors

Full Changelog: python-v0.18.1...python-v0.18.2

python-v0.18.1

12 Jun 16:16
Compare
Choose a tag to compare

New features

Bug Fixes

Other Changes

New Contributors

Full Changelog: python-v0.18.0...python-v0.18.1

python-v0.18.0: CDC for update operation, added `set table properties` operation

06 Jun 18:26
f23c92d
Compare
Choose a tag to compare

New features

  • feat: adopt kernel schema types by @roeap in #2495
  • feat: add stats to convert-to-delta operation by @gruuya in #2491
  • feat(python, rust): add set table properties operation by @ion-elgreco in #2264
  • feat: implement transaction identifiers - continued by @roeap in #2539
  • feat: introduce CDC write-side support for the Update operations by @rtyler in #2486

Bug Fixes

  • fix(rust, python): fixed differences in storage options between log and object stores by @mightyshazam in #2500
  • fix: enable field_with_name to support nested fields with '.' delimiter by @alexwilcoxson-rel in #2519
  • fix(python): release GIL on most operations by @adriangb in #2512
  • fix: clippy warnings by @imor in #2548
  • fix: remove deprecated overwrite_schema configuration which has incorrect behavior by @rtyler in #2554
  • fix: update deltalake crate examples for crate layout and TimestampNtz by @jhoekx in #2559
  • fix: consistently use raise_if_key_not_exists in CreateBuilder by @vegarsti in #2569
  • fix: cast support fields nested in lists and maps by @HawaiianSpork in #2541

Other Changes

New Contributors

Full Changelog: python-v0.17.4...python-v0.18.0

python-v0.17.4: stats collection according config

11 May 11:43
353e08b
Compare
Choose a tag to compare

New features

  • feat(python): add parameter to DeltaTable.to_pyarrow_dataset() by @adriangb in #2465
  • feat(python, rust): respect column stats collection configurations by @ion-elgreco in #2428

Bug Fixes

  • fix(rust): implement abort commit for S3DynamoDBLogStore by @PeterKeDer in #2452
  • fix(python, rust): use new schema for stats parsing instead of old by @ion-elgreco in #2480
  • fix: check to see if the file exists before attempting to rename by @rtyler in #2482
  • fix(rust): unable to read delta table when table contains both null and non-null add stats by @yjshen in #2476
  • fix(python, rust): region lookup wasn't working correctly for dynamo by @mightyshazam in #2488
  • fix: return unsupported error for merging schemas in the presence of partition columns by @emcake in #2469
  • fix(python): reuse state in to_pyarrow_dataset by @ion-elgreco in #2485

Other Changes

Full Changelog: python-v0.17.3...python-v0.17.4

python-v0.17.3: CDF read support

02 May 06:40
Compare
Choose a tag to compare

New features

  • feat(rust): advance state in post commit by @ion-elgreco in #2396
  • feat: cdf reader for delta tables by @hntd187 in #2048
  • feat(python, rust): add OBJECT_STORE_CONCURRENCY_LIMIT setting for ObjectStoreFactory by @zZKato in #2458

Bug Fixes

Other changes

New Contributors

  • @adriangb made their first contribution in #2454
  • @zZKato made their first contribution in #2458

Full Changelog: python-v0.17.2...python-v0.17.3

rust-v0.17.3

01 May 23:39
Compare
Choose a tag to compare

rust-v0.17.3 (2024-05-01)

Full Changelog

Implemented enhancements:

  • Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
  • How to get a DataFrame in Rust? #2404
  • Allow checkpoint creation when partion column is "timestampNtz " #2381
  • is there a way to make writing timestamp_ntz optional #2339
  • Update arrow dependency #2328
  • Release GIL in deltalake.write_deltalake #2234
  • Unable to retrieve custom metadata from tables in rust #2153
  • Refactor commit interface to be a Builder #2131

Fixed bugs:

  • Handle rate limiting during write contention #2451
  • regression : delta.logRetentionDuration don't seems to be respected #2447
  • Issue writing to mounted storage in AKS using delta-rs library #2445
  • TableMerger - when_matched_delete() fails when Column names contain special characters #2438
  • Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
  • Merge on predicate throw error on date colum: Unable to convert expression to string #2420
  • Writing Tables with Append mode errors if the schema metadata is different #2419
  • Logstore issues on AWS Lambda #2410
  • Datafusion timestamp type doesn't respect delta lake schema #2408
  • Compacting produces smaller row groups than expected #2386
  • ValueError: Partition value cannot be parsed from string. #2380
  • Very slow s3 connection after 0.16.1 #2377
  • Merge update+insert truncates a delta table if the table is big enough #2362
  • Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
  • Create empty table failed on rust engine #2354
  • Getting error message when running in lambda: message: "Too many open files" #2353
  • Temporary files filling up _delta_log folder - increasing table load time #2351
  • compact fails with merged schemas #2347
  • Cannot merge into table partitioned by date type column on 0.16.3 #2344
  • Merge breaks using logical datatype decimal128 #2343
  • Decimal types are not checked against max precision/scale at table creation #2331
  • Merge update+insert truncates a delta table #2320
  • Extract add.stats_parsed with wrong type #2312
  • Process fails without error message when executing merge #2310
  • delta_rs don't seems to respect the row group size #2309
  • Auth error when running inside VS Code #2306
  • Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
  • Schema evolution not coercing with Large arrow types #2298
  • Panic in deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}} #2290
  • Checkpoint does not preserve reader and writer features for the table protocol. #2288
  • Z-Order with larger dataset resulting in memory error #2284
  • Successful writes return error when using concurrent writers #2279
  • Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
  • Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
  • DeltaTable is not resilient to corrupted checkpoint state #2258
  • Inconsistent units of time #2256
  • Partition column comparison is an assertion rather than if block with raise exception #2242
  • Unable to merge column names starting from numbers #2230
  • Merging to a table with multiple distinct partitions in parallel fails #2227
  • cleanup_metadata not respecting custom logRetentionDuration #2180
  • Merge predicate fails with a field with a space #2167
  • When_matched_update causes records to be lost with explicit predicate #2158
  • Merge execution time grows exponetially with the number of column #2107
  • _internal.DeltaError when merging #2084

python-v0.17.2

25 Apr 13:48
6a7c684
Compare
Choose a tag to compare

What's Changed

  • chore: introduce the Operation trait to enforce consistency between operations by @rtyler in #2435
  • fix(python): reuse table state in write engine by @ion-elgreco in #2453

Full Changelog: python-v0.17.1...python-v0.17.2

python-v0.17.1

23 Apr 17:20
dd358ef
Compare
Choose a tag to compare

Bug Fixes

  • fix(python, rust): use from_name during column projection creation by @ion-elgreco in #2441
  • fix(python, rust): check timestamp_ntz in nested fields, add check_can_write in pyarrow writer by @ion-elgreco in #2443
  • fix(python, rust): remove imds calls from profile auth and region by @mightyshazam in #2442

Full Changelog: python-v0.17.0...python-v0.17.1

python-v0.17.0: checkpoint hook

22 Apr 17:50
15abe44
Compare
Choose a tag to compare

New features

Bug Fixes

  • fix(python, rust): expr parsing date/timestamp by @ion-elgreco in #2357
  • fix(rust): remove flush after writing every batch by @PeterKeDer in #2387
  • fix: return error when checkpoints and metadata get out of sync by @esarili in #2406
  • fix: time travel when checkpointed and logs removed by @ion-elgreco in #2389
  • fix(rust): timestamp deserialization format, missing type by @ion-elgreco in #2383
  • fix(rust): stats_parsed has different number of records with stats by @yjshen in #2405
  • fix(python): load_as_version with datetime object with no timezone specified by @t1g0rz in #2429
  • fix(python,rust): missing remove actions during create_or_replace specified by @ion-elgreco in #2437

Other Changes

New Contributors

Full Changelog: python-v0.16.4...python-v0.17.0