8.0 - Fix log delete bug that occurs when there's low disk space #4891

morgando · 2024-12-10T14:22:27Z

If nothing can be deleted globally when log deletion is attempted, a node will still calculate what it's willing to delete locally, save it in its memory, and send that out to other cluster nodes here.

The changes in this PR expose and fix a buggy edge case that occurs when there's nothing that can be deleted globally and a node is low on disk space. In this case, the node will retry log deletion until hitting a retry limit. After hitting the retry limit, the log delete function will exit here. This exit bypasses the code that saves and sends out the minimum file number that the node is willing to delete locally. Therefore the global low file number may never progress if there's sufficiently low disk space.

roborivers

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 21/564 tests failed ⚠.

The first 10 failing tests are:
sc_upsert [setup failure]
sc_tableversion_logicalsc_generated [setup failure]
writes_remsql_names [setup failure]
phys_rep_perf
simple_timepart_reptimeout_generated
simple_timepart
commit_delay_on_copy
simple_ssl
random_osql_replay
yast_stat4scan_generated

akshatsikarwar · 2024-12-11T15:13:37Z

Does it make sense to keep the special case for tracking low headroom? It retries immediately and unlikely anything has changed to make it succeed. We have a dedicated log-delete thread which runs on a timer already.

akshatsikarwar · 2024-12-11T15:16:50Z

I would also love to see the actual log file names in all the trace that this function prints. For e.g.:
Instead of : Can't delete log, age %ld not older than log delete age
Would like: Can't delete log.0000000592 , age %ld not older than log delete age

roborivers

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 5/563 tests failed ⚠.

The first 10 failing tests are:
oflowaddrem [setup failure]
phys_rep_perf
rem
lostwrite
truncatesc_offline_generated

roborivers

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Error ⚠.
Regression testing: 4/564 tests failed ⚠.

The first 10 failing tests are:
sc_inserts_logicalsc_generated [setup failure]
basic_snapshot_generated [setup failure]
phys_rep_perf
truncatesc_offline_generated

Signed-off-by: Morgan Douglas <[email protected]>

roborivers

Coding style check: Error. ⚠.
Smoke testing: Error ⚠.
Cbuild submission: Success ✓.
Regression testing: 9/564 tests failed ⚠.

The first 10 failing tests are:
sc_inserts [setup failure]
sc_remsql_fdbpush_generated [setup failure]
fdb_push_rte_connect_generated [setup failure]
logfill_logput_window_generated
phys_rep_perf
lostwrite
online_compaction
sc_resume2
truncatesc_offline_generated

morgando requested a review from akshatsikarwar December 10, 2024 14:23

morgando marked this pull request as ready for review December 10, 2024 14:23

morgando changed the title ~~Logdel headroom bug~~ Fix log delete bug that occurs when there's low disk space Dec 10, 2024

mponomar previously approved these changes Dec 10, 2024

View reviewed changes

roborivers suggested changes Dec 10, 2024

View reviewed changes

morgando added the R8 Affects version R8 label Dec 11, 2024

morgando changed the title ~~Fix log delete bug that occurs when there's low disk space~~ 8.0 - Fix log delete bug that occurs when there's low disk space Dec 11, 2024

morgando marked this pull request as draft December 11, 2024 15:36

morgando dismissed mponomar’s stale review via bf8a938 December 13, 2024 21:23

morgando force-pushed the logdel_headroom_bug branch from c60f6d1 to bf8a938 Compare December 13, 2024 21:23

morgando marked this pull request as ready for review December 13, 2024 21:25

morgando marked this pull request as draft December 13, 2024 21:26

morgando marked this pull request as ready for review December 13, 2024 21:27

akshatsikarwar approved these changes Dec 19, 2024

View reviewed changes

roborivers suggested changes Dec 19, 2024

View reviewed changes

mponomar approved these changes Dec 20, 2024

View reviewed changes

roborivers suggested changes Dec 23, 2024

View reviewed changes

Remove low headroom logic in log delete function

ecc9aab

Signed-off-by: Morgan Douglas <[email protected]>

morgando force-pushed the logdel_headroom_bug branch from bf8a938 to ecc9aab Compare December 24, 2024 16:09

roborivers suggested changes Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.0 - Fix log delete bug that occurs when there's low disk space #4891

8.0 - Fix log delete bug that occurs when there's low disk space #4891

morgando commented Dec 10, 2024 •

edited

Loading

roborivers left a comment

akshatsikarwar commented Dec 11, 2024 •

edited

Loading

akshatsikarwar commented Dec 11, 2024 •

edited

Loading

roborivers left a comment

roborivers left a comment

roborivers left a comment

8.0 - Fix log delete bug that occurs when there's low disk space #4891

Are you sure you want to change the base?

8.0 - Fix log delete bug that occurs when there's low disk space #4891

Conversation

morgando commented Dec 10, 2024 • edited Loading

roborivers left a comment

Choose a reason for hiding this comment

akshatsikarwar commented Dec 11, 2024 • edited Loading

akshatsikarwar commented Dec 11, 2024 • edited Loading

roborivers left a comment

Choose a reason for hiding this comment

roborivers left a comment

Choose a reason for hiding this comment

roborivers left a comment

Choose a reason for hiding this comment

morgando commented Dec 10, 2024 •

edited

Loading

akshatsikarwar commented Dec 11, 2024 •

edited

Loading

akshatsikarwar commented Dec 11, 2024 •

edited

Loading