safekeeper: fix atomicity of WAL truncation #9685

arssher · 2024-11-08T08:58:41Z

If WAL truncation fails in the middle it might leave some data on disk above the write/flush LSN. In theory, concatenated with previous records it might form bogus WAL (though very unlikely in practice because CRC would protect from that). To protect from that, set pending_wal_truncation flag: means before any WAL writes truncation must be retried until it succeeds. We already did that in case of safekeeper restart, now extend this mechanism for failures without restart. Also, importantly, reset LSNs in the beginning of the operation, not in the end, because once on disk deletion starts previous pointers are wrong.

All this most likely haven't created any problems in practice because CRC protects from the consequences.

Tests for this are hard; simulation infrastructure might be useful here in the future, but not yet.

github-actions · 2024-11-08T10:04:39Z

5337 tests run: 5115 passed, 0 failed, 222 skipped (full report)

Flaky tests (2)

Postgres 17

test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64
test_deletion_queue_recovery[no-validate-keep]: release-x86-64

Code coverage* (full report)

functions: 31.7% (7868 of 24802 functions)
lines: 49.4% (62234 of 125967 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
7f4a949 at 2024-11-08T10:04:38.941Z :recycle:}

arssher requested a review from a team as a code owner November 8, 2024 08:58

arssher requested review from skyzh and VladLazar November 8, 2024 08:58

safekeeper: fix atomicity of WAL truncation

7f4a949

arssher force-pushed the sk-atomic-wal-truncation branch from e1df3c4 to 7f4a949 Compare November 8, 2024 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safekeeper: fix atomicity of WAL truncation #9685

safekeeper: fix atomicity of WAL truncation #9685

arssher commented Nov 8, 2024

github-actions bot commented Nov 8, 2024

Postgres 17

safekeeper: fix atomicity of WAL truncation #9685

Are you sure you want to change the base?

safekeeper: fix atomicity of WAL truncation #9685

Conversation

arssher commented Nov 8, 2024

github-actions bot commented Nov 8, 2024

5337 tests run: 5115 passed, 0 failed, 222 skipped (full report)

Postgres 17

Code coverage* (full report)