Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.0 - Fix log delete bug that occurs when there's low disk space #4891

Open
wants to merge 1 commit into
base: 8.0
Choose a base branch
from

Conversation

morgando
Copy link
Contributor

@morgando morgando commented Dec 10, 2024

If nothing can be deleted globally when log deletion is attempted, a node will still calculate what it's willing to delete locally, save it in its memory, and send that out to other cluster nodes here.

The changes in this PR expose and fix a buggy edge case that occurs when there's nothing that can be deleted globally and a node is low on disk space. In this case, the node will retry log deletion until hitting a retry limit. After hitting the retry limit, the log delete function will exit here. This exit bypasses the code that saves and sends out the minimum file number that the node is willing to delete locally. Therefore the global low file number may never progress if there's sufficiently low disk space.

@morgando morgando marked this pull request as ready for review December 10, 2024 14:23
@morgando morgando changed the title Logdel headroom bug Fix log delete bug that occurs when there's low disk space Dec 10, 2024
mponomar
mponomar previously approved these changes Dec 10, 2024
Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 21/564 tests failed ⚠.

The first 10 failing tests are:
sc_upsert [setup failure]
sc_tableversion_logicalsc_generated [setup failure]
writes_remsql_names [setup failure]
phys_rep_perf
simple_timepart_reptimeout_generated
simple_timepart
commit_delay_on_copy
simple_ssl
random_osql_replay
yast_stat4scan_generated

@akshatsikarwar
Copy link
Contributor

akshatsikarwar commented Dec 11, 2024

Does it make sense to keep the special case for tracking low headroom? It retries immediately and unlikely anything has changed to make it succeed. We have a dedicated log-delete thread which runs on a timer already.

@akshatsikarwar
Copy link
Contributor

akshatsikarwar commented Dec 11, 2024

I would also love to see the actual log file names in all the trace that this function prints. For e.g.:
Instead of : Can't delete log, age %ld not older than log delete age
Would like: Can't delete log.0000000592 , age %ld not older than log delete age

@morgando morgando added the R8 Affects version R8 label Dec 11, 2024
@morgando morgando changed the title Fix log delete bug that occurs when there's low disk space 8.0 - Fix log delete bug that occurs when there's low disk space Dec 11, 2024
@morgando morgando marked this pull request as draft December 11, 2024 15:36
@morgando morgando force-pushed the logdel_headroom_bug branch from c60f6d1 to bf8a938 Compare December 13, 2024 21:23
@morgando morgando marked this pull request as ready for review December 13, 2024 21:25
@morgando morgando marked this pull request as draft December 13, 2024 21:26
@morgando morgando marked this pull request as ready for review December 13, 2024 21:27
Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 5/563 tests failed ⚠.

The first 10 failing tests are:
oflowaddrem [setup failure]
phys_rep_perf
rem
lostwrite
truncatesc_offline_generated

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Error ⚠.
Regression testing: 4/564 tests failed ⚠.

The first 10 failing tests are:
sc_inserts_logicalsc_generated [setup failure]
basic_snapshot_generated [setup failure]
phys_rep_perf
truncatesc_offline_generated

@morgando morgando force-pushed the logdel_headroom_bug branch from bf8a938 to ecc9aab Compare December 24, 2024 16:09
Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Error ⚠.
Cbuild submission: Success ✓.
Regression testing: 9/564 tests failed ⚠.

The first 10 failing tests are:
sc_inserts [setup failure]
sc_remsql_fdbpush_generated [setup failure]
fdb_push_rte_connect_generated [setup failure]
logfill_logput_window_generated
phys_rep_perf
lostwrite
online_compaction
sc_resume2
truncatesc_offline_generated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R8 Affects version R8
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants