Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_tablets / test_tablets_migration take 20 minutes in debug mode #21296

Open
avikivity opened this issue Oct 27, 2024 · 2 comments
Open

test_tablets / test_tablets_migration take 20 minutes in debug mode #21296

avikivity opened this issue Oct 27, 2024 · 2 comments
Assignees
Labels
area/tablets area/test Issues related to the testing system code and environment

Comments

@avikivity
Copy link
Member

1210.32444691658|debug|test_tablets
1199.00373029709|debug|test_tablets_migration
@avikivity avikivity added area/tablets area/test Issues related to the testing system code and environment labels Oct 27, 2024
@raphaelsc
Copy link
Member

test_tablets_migration is essentially about bringing nodes down during different stages of migration, to simulate failure and possibly revert migrations depending on the stage. I wonder if we can reduce the test time by tweaking the failure detector settings

@raphaelsc
Copy link
Member

turns out test_tablets_migration tried to adjust failure detector already, but it was limited by the echo interval. the test runtime can be reduced by 10% with the following changes:

diff --git a/test/topology_custom/test_tablets_migration.py b/test/topology_custom/test_tablets_migration.py
index 678668f59f..2e1e3e3a91 100644
--- a/test/topology_custom/test_tablets_migration.py
+++ b/test/topology_custom/test_tablets_migration.py
@@ -103,7 +103,8 @@ async def test_node_failure_during_tablet_migration(manager: ManagerClient, fail
         pytest.skip('Failing source during target cleanup is pointless')
 
     logger.info("Bootstrapping cluster")
-    cfg = {'enable_user_defined_functions': False, 'enable_tablets': True, 'failure_detector_timeout_in_ms': 2000}
+    cfg = {'enable_user_defined_functions': False, 'enable_tablets': True, 'failure_detector_timeout_in_ms': 200,
+           'error_injections_at_startup': ['gossiper_reduced_echo_interval']}
     host_ids = []
     servers = []
 
diff --git a/gms/gossiper.cc b/gms/gossiper.cc
index 68dc609c69..bac0a0ec4d 100644
--- a/gms/gossiper.cc
+++ b/gms/gossiper.cc
@@ -979,7 +979,7 @@ future<std::set<inet_address>> gossiper::get_unreachable_members_synchronized()
 future<> gossiper::failure_detector_loop_for_node(gms::inet_address node, generation_type gossip_generation, uint64_t live_endpoints_version) {
     auto last = gossiper::clk::now();
     auto diff = gossiper::clk::duration(0);
-    auto echo_interval = std::chrono::milliseconds(2000);
+    auto echo_interval = utils::get_local_injector().enter("gossiper_reduced_echo_interval") ? std::chrono::milliseconds(200) : std::chrono::milliseconds(2000);
     auto max_duration = echo_interval + std::chrono::milliseconds(_gcfg.failure_detector_timeout_ms());
     while (is_enabled()) {
         bool failed = false;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tablets area/test Issues related to the testing system code and environment
Projects
None yet
Development

No branches or pull requests

3 participants