-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8.0 - Fix log delete bug that occurs when there's low disk space #4891
base: 8.0
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 21/564 tests failed ⚠.
The first 10 failing tests are:
sc_upsert [setup failure]
sc_tableversion_logicalsc_generated [setup failure]
writes_remsql_names [setup failure]
phys_rep_perf
simple_timepart_reptimeout_generated
simple_timepart
commit_delay_on_copy
simple_ssl
random_osql_replay
yast_stat4scan_generated
Does it make sense to keep the special case for tracking low headroom? It retries immediately and unlikely anything has changed to make it succeed. We have a dedicated log-delete thread which runs on a timer already. |
I would also love to see the actual log file names in all the trace that this function prints. For e.g.: |
c60f6d1
to
bf8a938
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 5/563 tests failed ⚠.
The first 10 failing tests are:
oflowaddrem [setup failure]
phys_rep_perf
rem
lostwrite
truncatesc_offline_generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Error ⚠.
Regression testing: 4/564 tests failed ⚠.
The first 10 failing tests are:
sc_inserts_logicalsc_generated [setup failure]
basic_snapshot_generated [setup failure]
phys_rep_perf
truncatesc_offline_generated
Signed-off-by: Morgan Douglas <[email protected]>
bf8a938
to
ecc9aab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style check: Error. ⚠.
Smoke testing: Error ⚠.
Cbuild submission: Success ✓.
Regression testing: 9/564 tests failed ⚠.
The first 10 failing tests are:
sc_inserts [setup failure]
sc_remsql_fdbpush_generated [setup failure]
fdb_push_rte_connect_generated [setup failure]
logfill_logput_window_generated
phys_rep_perf
lostwrite
online_compaction
sc_resume2
truncatesc_offline_generated
If nothing can be deleted globally when log deletion is attempted, a node will still calculate what it's willing to delete locally, save it in its memory, and send that out to other cluster nodes here.
The changes in this PR expose and fix a buggy edge case that occurs when there's nothing that can be deleted globally and a node is low on disk space. In this case, the node will retry log deletion until hitting a retry limit. After hitting the retry limit, the log delete function will exit here. This exit bypasses the code that saves and sends out the minimum file number that the node is willing to delete locally. Therefore the global low file number may never progress if there's sufficiently low disk space.