Improve starting ipmi_sim program #9378

maximenoel8 · 2024-10-17T00:15:44Z

Context

If one of the IPMI Power management features is failing, it will keep ipmi_sim process running. The next feature will try to start again this ipmi_sim process. In this situation, the command ipmi_sim -n < /dev/null > /dev/null & will be executed and wait for the previous process to stop ( nothing will stop the process ). The command will not return any code and the ssh connection will die. This situation will not be detected by the ssh timeout and the testsuite will stay stuck forever at this stage.

What does this PR change?

Use nohup command to start the service. This command will return immediately a result even if the process is already running.
Improve the IPMI mocking cleanup. From my test, pkill fake_ipmi_host.sh || : was not killing successfully fake_ipmi_host. ( detected by rerunning several time the same features. )

GUI diff

No difference.

DONE

Documentation

No documentation needed: only internal and user invisible changes
DONE

Test coverage

ℹ️ If a major new functionality is added, it is strongly recommended that tests for the new functionality are added to the Cucumber test suite

No tests: already covered
DONE

Links

Port:
Issue:

DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Re-run a test

If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:

Re-run test "changelog_test"
Re-run test "backend_unittests_pgsql"
Re-run test "java_lint_checkstyle"
Re-run test "java_pgsql_tests"
Re-run test "ruby_rubocop"
Re-run test "schema_migration_test_pgsql"
Re-run test "susemanager_unittests"
Re-run test "javascript_lint"
Re-run test "spacecmd_unittests"

Before you merge

Check How to branch and merge properly!

Improve killing fake_ipmi_host.sh

github-actions · 2024-10-17T00:15:57Z

👋 Hello! Thanks for contributing to our project.
Acceptance tests will take some time (aprox. 1h), please be patient ☕
You can see the progress at the end of this page and at https://github.com/uyuni-project/uyuni/pull/9378/checks
Once tests finish, if they fail, you can check 👀 the cucumber report. See the link at the output of the action.
You can also check the artifacts section, which contains the logs at https://github.com/uyuni-project/uyuni/pull/9378/checks.

If you are unsure the failing tests are related to your code, you can check the "reference jobs". These are jobs that run on a scheduled time with code from master. If they fail for the same reason as your build, it means the tests or the infrastructure are broken. If they do not fail, but yours do, it means it is related to your code.

Reference tests:

KNOWN ISSUES

Sometimes the build can fail when pulling new jar files from download.opensuse.org . This is a known limitation. Given this happens rarely, when it does, all you need to do is rerun the test. Sorry for the inconvenience.

For more tips on troubleshooting, see the troubleshooting guide.

Happy hacking!
⚠️ You should not merge if acceptance tests fail to pass. ⚠️

srbarrios · 2024-10-17T06:53:42Z

testsuite/features/step_definitions/command_steps.rb

 end

 When(/^the server stops mocking an IPMI host$/) do
  get_target('server').run('pkill ipmi_sim')
-  get_target('server').run('pkill fake_ipmi_host.sh || :')
+  get_target('server').run("ps aux | grep [f]ake_ipmi_host.sh | awk '{print $2}' | xargs kill")


Let's add here verbose: true, so we can keep track of having duplicated processes.

srbarrios · 2024-10-17T06:56:21Z

testsuite/features/step_definitions/command_steps.rb

@@ -580,12 +580,12 @@
    raise ScriptError, 'File injection failed' unless success
  end
  server.run('chmod +x /etc/ipmi/fake_ipmi_host.sh', verbose: true, check_errors: true)
-  server.run('ipmi_sim -n < /dev/null > /dev/null &', verbose: true, check_errors: true)
+  server.run('nohup ipmi_sim -n > /var/log/ipmi_sim.log 2>&1 &', verbose: true, check_errors: true)


Maybe before starting a new one, we can have an assert/pre-condition where we assure there is no other process already running.
If we have it, we might want to skip the start with a warning message, or we might want to just fail the step.

szachovy · 2024-10-17T07:35:06Z

testsuite/features/step_definitions/command_steps.rb

 end

 When(/^the server stops mocking an IPMI host$/) do
  get_target('server').run('pkill ipmi_sim')
-  get_target('server').run('pkill fake_ipmi_host.sh || :')
+  get_target('server').run("ps aux | grep [f]ake_ipmi_host.sh | awk '{print $2}' | xargs kill")


I would stick with the previous version, there are two issues with it:

Certain implementations of ps might lead to a different output, thus it may require extra handling in the future.

|| : is kind of a safety mechanism, if grep won't find this process it may lead to errors.

It involves more moving components: ps, grep, awk, and xargs, kill whereas there you just have pkill.

I agree in your points @szachovy but @maximenoel8 said that pkill was not killing successfully, and we need to address it somehow.

If you look in uyuni podman server you will see the pkill is not working:

uyuni-master-podman-srv:~ # ps aux | grep fake_ipmi_host.sh root 10582 0.0 0.0 4236 3120 ? S Oct16 0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh root 11787 0.0 0.0 9156 2336 pts/0 S+ 09:48 0:00 grep --color=auto fake_ipmi_host.sh root 13152 0.0 0.0 4236 3104 ? S Oct16 0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh

--full is missing:

uyuni-master-podman-srv:~ # ps aux | grep fake_ipmi_host.sh root 10582 0.0 0.0 4236 3120 ? S Oct16 0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh root 13152 0.0 0.0 4236 3104 ? S Oct16 0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh root 13807 0.0 0.0 8780 2328 pts/1 S+ 10:03 0:00 grep --color=auto fake_ipmi_host.sh uyuni-master-podman-srv:~ # pkill --full fake_ipmi_host.sh uyuni-master-podman-srv:~ # ps aux | grep fake_ipmi_host.sh root 13877 0.0 0.0 8780 816 pts/1 S+ 10:03 0:00 grep --color=auto fake_ipmi_host.sh

Nice, it seems the proper fix for that killing, indeed.

Improve starting ipmi_sim program

8959512

Improve killing fake_ipmi_host.sh

maximenoel8 requested a review from a team as a code owner October 17, 2024 00:15

maximenoel8 self-assigned this Oct 17, 2024

github-actions bot added testing ruby_rubocop test-framework labels Oct 17, 2024

srbarrios reviewed Oct 17, 2024

View reviewed changes

szachovy reviewed Oct 17, 2024

View reviewed changes

Improve pkill for fark_ipmi and check service before restarting one

423018f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve starting ipmi_sim program #9378

Improve starting ipmi_sim program #9378

maximenoel8 commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Oct 17, 2024

srbarrios Oct 17, 2024

srbarrios Oct 17, 2024

szachovy Oct 17, 2024

srbarrios Oct 17, 2024

maximenoel8 Oct 17, 2024 •

edited

Loading

szachovy Oct 17, 2024

srbarrios Oct 17, 2024

Improve starting ipmi_sim program #9378

Are you sure you want to change the base?

Improve starting ipmi_sim program #9378

Conversation

maximenoel8 commented Oct 17, 2024 • edited Loading

Context

What does this PR change?

GUI diff

Documentation

Test coverage

Links

Changelogs

Re-run a test

Before you merge

github-actions bot commented Oct 17, 2024

srbarrios Oct 17, 2024

Choose a reason for hiding this comment

srbarrios Oct 17, 2024

Choose a reason for hiding this comment

szachovy Oct 17, 2024

Choose a reason for hiding this comment

srbarrios Oct 17, 2024

Choose a reason for hiding this comment

maximenoel8 Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

szachovy Oct 17, 2024

Choose a reason for hiding this comment

srbarrios Oct 17, 2024

Choose a reason for hiding this comment

maximenoel8 commented Oct 17, 2024 •

edited

Loading

maximenoel8 Oct 17, 2024 •

edited

Loading