-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[e2e][backend] Add test cases for restore backup with snapshot created #1384
base: main
Are you sure you want to change the base?
[e2e][backend] Add test cases for restore backup with snapshot created #1384
Conversation
api_client.volumes.delete(vol_name) | ||
|
||
@pytest.mark.dependency(depends=["TestBackupRestoreWithSnapshot::tests_backup_vm"], param=True) | ||
def test_with_snapshot_restore_replace_retain_vols( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parameterize delete_volumes
thus can test Delete Previous Volumes
for both Delete
and Retain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to use parameterize config to cover both the Delete
and Retain
test cases.
While according to the test issue #1045
We just select the Restore backup to replace existing (retain volume)
The reason is when the vm also have snapshot created, even if we shutdown the vm, the backend would check and prevent to restore to replace existing with delete volume.
Thus here we test the replace existing with retain volume only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot function is based on the volume, so delete volume will not work with snapshot should be the expected feature, or maybe we can discuss the case in sync up meeting to double confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
After moving the snapshot restore test cases into the original TestBackupRestore
.
When test successfully complete, all the test generated volume will be automatically cleanup.
a8fd100
to
71c5a97
Compare
@TachunLin Please help check the error and make sure the test run successfully in Jenkins. |
Thanks for the reminder. I checked the test report of The reason is we not yet have the backup bucket created in our ecm lab minio artifact endpoint Then I set the same The result is when I execute Next I trigger a new test
I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local. |
Hi @TachunLin ,
If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab. CC @lanfon72 |
Those test cases are known issue in harvester/harvester#4640 |
Hi @TachunLin, Just sent a quick fix PR, please refer to |
Thank you @albinsun for finding the root cause and created PR #1419 to fix the flaky case of I also added the And trigger the new test on the main Jenkins vm towards the It can well fix the And I also trigger the entire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move those test cases into TestBackupRestore
(in the last) to reduce test time, as both are depends on the same test_backup_vm
, I feel that it is not necessary to create another class for it.
class TestBackupRestoreWithSnapshot: | ||
|
||
@pytest.mark.dependency() | ||
def test_connection(self, api_client, backup_config, config_backup_target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already tested the connection in previous class, we don't need duplicate the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
The reason to have the test connection running again is to maintain the full function of the backup and restore test cases inside the new test class TestBackupRestoreWithSnapshot
According to the test result in the follow up comment
#1384 (comment)
And consider the trade off of pros and cons, may I keep these tests in the new test class. Many thanks.
assert 200 == code, f'Failed to test backup target connection: {data}' | ||
|
||
@pytest.mark.dependency(depends=["TestBackupRestoreWithSnapshot::test_connection"], param=True) | ||
def tests_backup_vm(self, api_client, wait_timeout, backup_config, base_vm_with_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, it looks you are going to reuse the test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
The reason to have the test connection running again is to provide a new backup for all the test cases inside the new test class TestBackupRestoreWithSnapshot
.
According to the test result in the follow up comment
#1384 (comment)
And consider the trade off of pros and cons, may I keep these tests in the new test class. Many thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to decouple new test cases without using existed one, you can just add another fixture to spawn up VM and do the backup (as it is the prerequisite of new test cases).
The problem here is, if you added those two duplicated (totally same as previous class) test cases, which means the total of test cases in report is not reflect to what we actually implemented and covered tests.
to add them after existed test cases would be more easy than create another fixture, that's why I suggested so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TachunLin, any update or comment?
the current point is we should not have 2 same tests
TestBackupRestore:tests_backup_vm
TestBackupRestoreWithSnapshot:tests_backup_vm
I think one way is to make TestBackupRestoreWithSnapshot:tests_backup_vm
a class fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TachunLin ,
Per discussed, let you know that /issues/1462 has been fixed so there should be no connection fail.
You can continue surveying the refactor of this task item, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albinsun for fixing the connection failed issue.
According to the discussion and based on the review comment.
I think it would be better moving the new snapshot restore test cases right after existed test cases in the same TestBackupRestore
. It would solve the following concerns:
- No reuse and duplication of
test_connection
andtests_backup_vm
- Save time without running the take backup action twice
I already made the fix and tested working on the local to remote ECM lab machine
Will try again on the main Jenkins jobs to double confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested PASS
on the main Jenkins Harvester run test jobs #102
All the new snapshot restore tests can execute well inside the TestBackupRestore
test class.
http://172.19.99.74/job/harvester-runtests/102/TestReport/
The S3 related tests are blocked to proceed due to the environment broken caused by vpn disconnection yesterday.
71c5a97
to
8c0aef0
Compare
a77925b
to
cfdde7e
Compare
cfdde7e
to
2d28bc4
Compare
@@ -605,9 +605,227 @@ def test_restore_replace_with_vm_shutdown_command( | |||
spec = api_client.backups.RestoreSpec.for_existing(delete_volumes=True) | |||
code, data = api_client.backups.restore(unique_vm_name, spec) | |||
assert 201 == code, f'Failed to restore backup with current VM replaced, {data}' | |||
vm_getable, (code, data) = vm_checker.wait_getable(unique_vm_name) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove wait_getable
?
Won't possible to hit #1419?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
I think that would probably because during the rebase process I miss something for not adding this check back.
Just added the wait_getable
action back to the original places.
code, data = api_client.vm_snapshots.get(vm_snapshot_name) | ||
if data.get("status", {}).get("readyToUse"): | ||
break | ||
print(f"waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop temporary code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The purpose to add the line to check vm_snapshot is ready was to ensure the snapshot are actually created before proceeding to the next step.
print(f"waiting for {vm_snapshot_name} to be ready") | ||
sleep(3) | ||
else: | ||
raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have code
and data
in error message like other assers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Add the code
and data
in error message like others in the code.
raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") | ||
|
||
assert 200 == code | ||
assert data.get("status", {}).get("readyToUse") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these 2 lines be in the while loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Indeed these 2 lines can be put into the previous while loop since a redundant check exists on
data.get("status", {}).get("readyToUse")
. Update the code accordingly.
# Check VM Started then get IPs (vm and host) | ||
vm_got_ips, (code, data) = vm_checker.wait_ip_addresses(unique_vm_name, ['default']) | ||
assert vm_got_ips, ( | ||
f"Failed to Start VM({unique_vm_name}) with errors:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any restore or restart here?
If want to check VM still Running and accessible, then perhaps alter the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The VM did not restart before, just make sure the vm is running after taking the snapshot.
Just updated the assert error message accordingly.
code, data = api_client.vm_snapshots.get(vm_snapshot_name) | ||
if data.get("status", {}).get("readyToUse"): | ||
break | ||
print(f"waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Add the code and data in error message like others in the code.
raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") | ||
|
||
assert 200 == code | ||
assert data.get("status", {}).get("readyToUse") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Indeed these 2 lines can be put into the previous while loop since a redundant check exists on data.get("status", {}).get("readyToUse")
.
Update the code accordingly.
vm_got_ips, (code, data) = vm_checker.wait_ip_addresses(unique_vm_name, ['default']) | ||
assert vm_got_ips, ( | ||
f"Failed to Start VM({unique_vm_name}) with errors:\n" | ||
f"Status: {data.get('status')}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The VM did not restart before, just make sure the vm is running after taking the snapshot.
Just updated the assert error message accordingly.
) as sh: | ||
out, err = sh.exec_command(f"echo {pub_key!r} > {base_vm_with_data['data']['path']}") | ||
assert not err, (out, err) | ||
sh.exec_command('sync') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does messing operation also needed in restore new?
As it's restore new, the messed data will not get back on base_vm.
2d28bc4
to
435a78b
Compare
435a78b
to
534f329
Compare
faa5615
to
534f329
Compare
Which issue(s) this PR fixes:
Issue #1045
What this PR does / why we need it:
According to issue harvester/harvester#4954
We need to add the backend e2e test to the
vm_backup_restore
integration testTo cover the case when a vm both have backup and snapshot created on it, when we restore this vm from backup.
It should be restore successfully.
Added the following test script:
Special notes for your reviewer:
Test result (Trigger locally and execute test on remote ecm lab machine)
Test can PASS all test cases in the
TestBackupRestoreWithSnapshot
classTest can PASS most of the test cases in the
test_4_vm_backup_restore.py
fileTestBackupRestore
andTestBackupRestoreOnMigration
andTestMultipleBackupRestore
, consider the stability and future test scalability, thus I create a separate class for the new test scenario.