Skip to content

Conversation

@chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Jan 9, 2026

Description

Close #27805

write a checkpoint for CREATE OR REPLACE

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Delta Lake
* Fix failure reading checkpoint created by `CREATE OR REPLACE` with different schema. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 9, 2026
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jan 9, 2026
@ebyhr
Copy link
Member

ebyhr commented Jan 9, 2026

Fix failure reading checkpoint created by CREATE OR REPLACE with different schema.

This is a bug fix to the write logic, right? The current release note entry looks misleading if so.

}

@Test
void testCreateReplaceReadingCheckpointWithDifferentSchema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes even without this PR change. I think we should add SELECT statement to L295-296.

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does spark or other engines or delta lake spec mentions about writing checkpoint after we run CREATE OR REPLACE operation ?

@findinpath
Copy link
Contributor

Does spark or other engines or delta lake spec mentions about writing checkpoint after we run CREATE OR REPLACE operation ?

I tried on Databricks (Spark OSS does not support CREATE OR REPLACE) and have seen a similar behavior:

CREATE TABLE test_cort.t1 (x int, y int)  using delta TBLPROPERTIES ('delta.checkpointInterval' = '2');
insert into test_cort.t1 values (1, 1);
insert into test_cort.t1 values (2, 2);
select * from test_cort.t1;

CREATE OR REPLACE TABLE test_cort.t1 (x int, y string)  using delta TBLPROPERTIES ('delta.checkpointInterval' = '2');

s3 listing

2026-01-09 08:02:13          0 
2026-01-09 08:02:14          0 .s3-optimization-0
2026-01-09 08:02:14          0 .s3-optimization-1
2026-01-09 08:02:14          0 .s3-optimization-2
2026-01-09 08:02:17       1979 00000000000000000000.crc
2026-01-09 08:02:14        986 00000000000000000000.json
2026-01-09 08:02:20       2430 00000000000000000001.crc
2026-01-09 08:02:19        977 00000000000000000001.json
2026-01-09 08:02:24      17420 00000000000000000002.checkpoint.parquet
2026-01-09 08:02:23       2880 00000000000000000002.crc
2026-01-09 08:02:21        977 00000000000000000002.json
2026-01-09 08:02:30      16285 00000000000000000003.checkpoint.parquet
2026-01-09 08:02:29       1964 00000000000000000003.crc
2026-01-09 08:02:27       1680 00000000000000000003.json
2026-01-09 08:02:30       5059 _last_checkpoint

FWIW on Databricks, if the schema of the table is not being changed, there is no new checkpoint being created.


if (replaceExistingTable) {
writeCheckpointIfNeeded(session, schemaTableName, location, tableHandle.toCredentialsHandle(), tableHandle.getReadVersion(), checkpointInterval, commitVersion);
writeCheckpoint(session, schemaTableName, location, tableHandle.toCredentialsHandle(), commitVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out in #27886 (comment) if the schema of the table does not change, there is no new checkpoint created by default (if it doesn't correspond to the checkpoint interval property) in DBX. Consider whether it makes sense to follow the same pattern here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - We would also think about handling about schema evolution (we don't support it now) - but when we support it do we need to create a new checkpoint for each alter operation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested dbx, only when changing column type it will add a checkpoint.

@chenjian2664 chenjian2664 force-pushed the jack/fix-delta-corruption-checkpoint branch from e6cd65c to ce0932d Compare January 9, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

Trino-created Delta Lake tables left in broken/unreadable state

4 participants