DAOS-18387 test: recovery/ddb.py test_recovery_ddb_ls MD-on-SSD Support#17332
DAOS-18387 test: recovery/ddb.py test_recovery_ddb_ls MD-on-SSD Support#17332shimizukko wants to merge 21 commits intomasterfrom
Conversation
To support MD-on-SSD for ddb, we need to support two commands. ddb prov_mem and ddb ls with --db_path. Update ddb_utils.py to support the new commands. Add check_ram_used in recovery_utils.py to detect whether the system is MD-on-SSD. Update test_recovery_ddb_ls to support MD-on-SSD with the new ddb commands. We need to update the test yaml to run on MD-on-SSD/HW Medium, but that will break other tests in ddb.py because they don't support MD-on-SSD yet. Keep the original tests as ddb_pmem.py and ddb_pmem.yaml and keep running them on VM (except test_recovery_ddb_ls because that's updated in this PR). Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
|
Ticket title is 'CR Test Update - recovery/ddb.py test_recovery_ddb_ls MD-on-SSD Support' |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17332/1/display/redirect |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17332/5/execution/node/857/log |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17332/5/execution/node/898/log |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
|
@phender @dinghwah I have two questions:
Thanks. |
src/tests/ftest/recovery/ddb_pmem.py
Outdated
| Args: | ||
| remote_file_path (str): File path to copy to local. | ||
| test_dir (str): Test directory. Usually self.test_dir. | ||
| remote (str): Remote hostname to copy file from. |
There was a problem hiding this comment.
get_clush_command requires a NodeSet.
| remote (str): Remote hostname to copy file from. | |
| remote (NodeSet): Remote hostname to copy file from. |
There was a problem hiding this comment.
Did the change not get committed?
There was a problem hiding this comment.
I forgot to add this file. Fixed.
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Also use self.server_managers[0].manager.job.yaml.metadata_params.path.value to get control_metadata_path. Update self.fail log message to include failed hosts. Update comment. Include up-to-date sample output. Update test yaml timeout value. Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
src/tests/ftest/recovery/ddb.py
Outdated
| vos_paths = self.server_managers[0].get_vos_files(pool) | ||
| if not vos_paths: | ||
| self.fail( | ||
| f"vos file wasn't found in {self.server_managers[0].get_vos_paths(pool)[0]}") |
There was a problem hiding this comment.
Why would we call get_vos_files() a second time for this error message, if it returned an empty list the first time we called it?
There was a problem hiding this comment.
I don't remember why I wrote this. Fixed.
src/tests/ftest/recovery/ddb.yaml
Outdated
| test_clients: 1 | ||
|
|
||
| timeout: 1800 | ||
| timeout: 30M |
There was a problem hiding this comment.
Based upon https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17332/12/testReport/FTEST_recovery/DdbTest/ should this be 6 minutes?
| timeout: 30M | |
| timeout: 360 |
There was a problem hiding this comment.
In my FTC node, it takes much longer than 6 min, so I put 30 min. Maybe it's slow if the node doesn't have PMEM? I adjusted it to the reasonable value for CI.
| test_servers: 1 | ||
| test_clients: 1 | ||
|
|
||
| timeout: 1800 |
There was a problem hiding this comment.
Based upon https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17332/12/testReport/FTEST_recovery/DdbPMEMTest/ should this be 6 minutes?
| timeout: 1800 | |
| timeout: 360 |
src/tests/ftest/recovery/ddb_pmem.py
Outdated
| Args: | ||
| remote_file_path (str): File path to copy to local. | ||
| test_dir (str): Test directory. Usually self.test_dir. | ||
| remote (str): Remote hostname to copy file from. |
There was a problem hiding this comment.
Did the change not get committed?
| @@ -0,0 +1,468 @@ | |||
| """ | |||
| (C) Copyright 2022-2024 Intel Corporation. | |||
There was a problem hiding this comment.
Seems this copyright is wrong / copy-pasted?
There was a problem hiding this comment.
I thought we leave the Intel Copyright statement and add HPE below it.
There was a problem hiding this comment.
But this is a new file, right? So there was no work done on it at Intel
There was a problem hiding this comment.
Oh, I see in your other comment you moved it
ddb_pmem.py used to be ddb.py
That wasn't obvious to me to GitHub's diff
There was a problem hiding this comment.
Yes, so I mentioned that in the comment above, but didn't include you in the at mention.
| self.random_akey = get_random_string(10) | ||
| self.random_data = get_random_string(10) | ||
|
|
||
| def test_recovery_ddb_rm(self): |
There was a problem hiding this comment.
In general we should use log_step in these new tests
There was a problem hiding this comment.
ddb_pmem.py used to be ddb.py. I added MD-on-SSD support for test_recovery_ddb_ls, but then I had to also update the test yaml to support MD-on-SSD, so I renamed the original test to ddb_pmem.py and moved test_recovery_ddb_ls to ddb.py, which now supports both modes. I'll move the remaining tests in ddb_pmem.py to ddb.py and add MD-on-SSD support. At that time, I want to do a bunch of refactoring including log_step.
There was a problem hiding this comment.
You're just moving these temporarily to ddb_pmem.py and then plan to move them back? Is it too much work to make them all work now?
There was a problem hiding this comment.
I refactored ddb_pmem.py to include log_step, etc.
Remove get_vos_paths() from self.fail. Reduce timeout to 7M. Use single quotes to surround double quotes. Move DdbCommand instantiation outside of if-else. Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
src/tests/ftest/util/ddb_utils.py
Outdated
| Before calling this method, "" (two double quotes) needs to be set to | ||
| self.vos_path. |
There was a problem hiding this comment.
I don't understand this comment. Why can't this function handle that?
There was a problem hiding this comment.
This is my opinion.
If we do self.vos_path.value = '""' here, vos_path = '""' in ddb.py (line 138) would be unnecessary, so we should remove that. In that case, what do we set to vos_path when we instantiate DdbCommand? Setting None and letting prov_mem() update it to '""' later seems odd. I meant the comment as instruction rather than requirement.
There was a problem hiding this comment.
Why does it need to be set to '""' to begin with? Does the command itself require an empty path? That seems odd
There was a problem hiding this comment.
Yes, the command itself requires an empty path. I agree it's odd.
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <makito.kano@hpe.com>
|
@daos-stack/daos-gatekeeper I updated the commit message, but if we simply merge it, I believe the old message will be used. Please use the one shown above when merging. |
To support MD-on-SSD for ddb, we need to support two commands. ddb prov_mem and ddb ls with --db_path.
Update ddb_utils.py to support the new commands.
Update test_recovery_ddb_ls to support MD-on-SSD with the new ddb commands.
We need to update the test yaml to run on MD-on-SSD/HW Medium, but that will break other tests in ddb.py because they don't support MD-on-SSD yet. Keep the original tests as ddb_pmem.py and ddb_pmem.yaml and keep running them on VM (except test_recovery_ddb_ls because that's updated in this PR).
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_recovery_ddb_ls DdbPMEMTest
Steps for the author:
After all prior steps are complete: