Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: fix atomicity of WAL truncation #9685

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arssher
Copy link
Contributor

@arssher arssher commented Nov 8, 2024

If WAL truncation fails in the middle it might leave some data on disk above the write/flush LSN. In theory, concatenated with previous records it might form bogus WAL (though very unlikely in practice because CRC would protect from that). To protect from that, set pending_wal_truncation flag: means before any WAL writes truncation must be retried until it succeeds. We already did that in case of safekeeper restart, now extend this mechanism for failures without restart. Also, importantly, reset LSNs in the beginning of the operation, not in the end, because once on disk deletion starts previous pointers are wrong.

All this most likely haven't created any problems in practice because CRC protects from the consequences.

Tests for this are hard; simulation infrastructure might be useful here in the future, but not yet.

@arssher arssher requested a review from a team as a code owner November 8, 2024 08:58
Copy link

github-actions bot commented Nov 8, 2024

5337 tests run: 5115 passed, 0 failed, 222 skipped (full report)


Flaky tests (2)

Postgres 17

Code coverage* (full report)

  • functions: 31.7% (7868 of 24802 functions)
  • lines: 49.4% (62234 of 125967 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7f4a949 at 2024-11-08T10:04:38.941Z :recycle:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant