MgmtRestore nemesis is failing with Data verification stress command, triggered by the 'mgmt_restore' nemesis, has failed
assertion
#4092
Labels
qa
should be used for qa team testing tasks
Starting from 2024.1.12 patch release the longevity-50gb-3days-test SCT test started to fail during
disrupt_mgmt_restore
disruption with the error:The sequence of nemeses in this test is that
disrupt_mgmt_restore
disruption is performed 2 times one by one, where the 1st one succeeds and the 2nd is failing, e.g.:but up until 2024.1.12 release there was no problem with this sequence.
As per SCT logs the 2nd restore task is finished successfully, but then the
confirmation
stress command is failing:As per the loader nodes logs seems like stress command was not finished/stuck on one of loader nodes,which probably caused the fail of verification stress command. E.g. in build #31 of the test, the loader node-1 cassandra-stress-read-l1-c0-k1-e37ba171-92fa-4c59-be6f-17c4e3e07e60.log shows that the stress command execution was not finished (below is the very end of log):
Packages
Scylla version:
2024.1.12-20241023.6140bb5b2d0a
with build-id8c924fad33d4abafe401b24355fd2ad7458df89b
Kernel Version:
5.15.0-1071-aws
Installation details
Cluster size: 6 nodes (i4i.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-08b1ade2fc79cebbf
(aws: undefined_region)Test:
longevity-50gb-3days-test
Test id:
7abc7f75-4694-44d1-b32e-a45482a28938
Test name:
enterprise-2024.1/longevity/longevity-50gb-3days-test
Test method:
longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 7abc7f75-4694-44d1-b32e-a45482a28938
$ hydra investigate show-logs 7abc7f75-4694-44d1-b32e-a45482a28938
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: