-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(manual_compact): fix replica lose manual compact finished status after replica migrate bug #1961
base: master
Are you sure you want to change the base?
fix(manual_compact): fix replica lose manual compact finished status after replica migrate bug #1961
Conversation
@@ -1758,14 +1758,22 @@ dsn::error_code pegasus_server_impl::start(int argc, char **argv) | |||
dsn::ERR_LOCAL_APP_FAILURE, | |||
"open app failed, unsupported data version {}", | |||
_pegasus_data_version); | |||
// update last manual compact finish timestamp | |||
uint64_t last_manual_compact_used_time = 0; | |||
LOG_AND_RETURN_NOT_OK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When upgrade from old versions, ERR_OBJECT_NOT_FOUND will be returned, right?
return last_manual_compact_finish_time; | ||
} | ||
|
||
void pegasus_server_impl::after_manual_compact(std::uint64_t starttime, uint64_t endtime) | ||
{ | ||
// store last manual compact used time to meta store for learn situation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the replica server shutdown before the replicas complete the manual compaction, can this patch resolve this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this patch will still work. Cause if replica server shutdown before the replicas complete the manual compaction,the last_manual_compact_finish_time
would not be update. So it will start compaction again.
It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it. |
Co-authored-by: Yingchun Lai <[email protected]>
I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. |
What is "using spark to set manual compaction time" ? Is it a Pegasus binding method? I don't think it's a block to remove the periodic manual compaction. This is not the core tasks of Pegasus, it make the code a bit of smelly. At least, you need to make this patch to resolve these issues mentioned above. |
In high version of PEGASUS-SPARK , a compaction request will be send to cluster after user call |
Thanks to update the pegasus-spark project! It's not well maintenance for about 4 years. Do you think it's confusing that the manual compact status depends on If the paramaters are set in a previous round, and havn't complete compaction and set the vaiables in current round before restarting the server, will it be considered as finished? Could you add some tests to verify the bug has been fixed? |
Firstly , I update the newest version of PEGASUS-SPARK and commit a merge request. |
What problem does this PR solve?
#1665
And cause the misstake operate of PR #1666, I closed it and forced pull it.
So I have to re pull in another pull request.
What is changed and how does it work?
I add another string in
meta_store
(meta_store is a column families which persist pegasus_last_manual_compact_finish_time and so on).So that we can read the value from meta_store when we recover a replica after replica migrate.
Checklist
Tests
1.Create table,and put some data in it.
2.Let table begin to do manual compaction
3.Finish manual compaction and view the replica relationship of this table.
4.Stop a node,and check the progress of compaction.
5.Wait a few minutes,and check if still have some replica can not show a Finish status.
Test Result
Code changes
LAST_MANUAL_COMPACT_USED_TIME
from meta_store when DB exist.LAST_MANUAL_COMPACT_USED_TIME
to zero when DB not exist.LAST_MANUAL_COMPACT_USED_TIME
to the time replica last used.do_manual_compact