You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)
Bug Summary
Sometimes jobs fail with the following error:
<10.193.67.10> ESTABLISH SSH CONNECTION FOR USER: root
<10.193.67.10> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=60813 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/runner/cp/e9c44f7b2f"' 10.193.67.10 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 465, in send_callback
method(*new_args, **kwargs)
File "/runner/artifacts/2975782/callback/awx_display.py", line 630, in v2_playbook_on_stats
File "/usr/lib64/python3.11/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/runner/artifacts/2975782/callback/awx_display.py", line 374, in capture_event_data
File "/runner/artifacts/2975782/callback/awx_display.py", line 239, in dump_begin
File "/runner/artifacts/2975782/callback/awx_display.py", line 111, in set
FileNotFoundError: [Errno 2] No such file or directory: '/runner/artifacts/2975782/job_events'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/ansible/cli/__init__.py", line 659, in cli_executor
exit_code = cli.run()
^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ansible/cli/playbook.py", line 156, in run
results = pbex.run()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 246, in run
self._tqm.send_callback('v2_playbook_on_stats', self._tqm._stats)
File "/usr/local/lib/python3.11/site-packages/ansible/utils/lock.py", line 41, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 468, in send_callback
display.warning(u"Failure using method (%s) in callback plugin (%s): %s" % (to_text(method_name), to_text(callback_plugin), to_text(e)))
File "/runner/artifacts/2975782/callback/awx_display.py", line 256, in wrapper
File "/usr/local/lib/python3.11/site-packages/ansible/utils/display.py", line 134, in proxyit
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ansible/utils/display.py", line 539, in warning
self.display(new_msg, color=C.COLOR_WARN, stderr=True)
File "/runner/artifacts/2975782/callback/awx_display.py", line 304, in wrapper
File "/runner/artifacts/2975782/callback/awx_display.py", line 239, in dump_begin
File "/runner/artifacts/2975782/callback/awx_display.py", line 111, in set
FileNotFoundError: [Errno 2] No such file or directory: '/runner/artifacts/2975782/job_events'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/ansible-playbook", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/ansible/cli/playbook.py", line 240, in main
PlaybookCLI.cli_executor(args)
File "/usr/local/lib/python3.11/site-packages/ansible/cli/__init__.py", line 687, in cli_executor
display.error("Unexpected Exception, this is probably a bug: %s" % to_text(e), wrap_text=False)
File "/runner/artifacts/2975782/callback/awx_display.py", line 256, in wrapper
File "/usr/local/lib/python3.11/site-packages/ansible/utils/display.py", line 594, in error
self.display(new_msg, color=C.COLOR_ERROR, stderr=True)
File "/runner/artifacts/2975782/callback/awx_display.py", line 304, in wrapper
File "/runner/artifacts/2975782/callback/awx_display.py", line 239, in dump_begin
File "/runner/artifacts/2975782/callback/awx_display.py", line 111, in set
FileNotFoundError: [Errno 2] No such file or directory: '/runner/artifacts/2975782/job_events'
But sometimes it happens that the job just hangs (no progress), and then the private-data-dir on the execution node is missing. The /runner directory that is mapped into the podman container is empty. In these cases there are still ansible-playbook processes that are running:
If a timeout is configured on the AWX job, then it fails with "Failed to JSON parse a line from worker stream".
AWX version
24.6.1
Select the relevant components
UI
UI (tech preview)
API
Docs
Collection
CLI
Other
Installation method
docker development environment
Modifications
no
Ansible version
9
Operating system
Debian 12
Web browser
Firefox
Steps to reproduce
I don't know. It seems that the error appears on longer-running jobs more often. Sometimes it fails after 15 minutes sometimes after 150 minutes. The number of slices and forks makes no difference. Even with 1 fork and 1 slice it might fail. But it seems that the playbook must do some file lookups or delegate_to localhost to read some files.
Expected results
Job should succeed.
Actual results
Job fails with "file not found" error.
Additional information
I first suspected an issue with ansible-runner, because it is executed with the --delete flag and that it for some reason cleans up the private-data-dir. So I tried it with ansible-runner=2.3.6 on the exec nodes, but that doesn't change anything.
Then I downgraded receptor to 1.4.1 on the exec nodes, but that doesn't make a difference, too.
The "file not found" error and that there is a defunct ansible-playbook process are the only hints I have so far. I don't see anything else in the logs.
The text was updated successfully, but these errors were encountered:
Please confirm the following
[email protected]
instead.)Bug Summary
Sometimes jobs fail with the following error:
But sometimes it happens that the job just hangs (no progress), and then the private-data-dir on the execution node is missing. The /runner directory that is mapped into the podman container is empty. In these cases there are still ansible-playbook processes that are running:
If a timeout is configured on the AWX job, then it fails with "Failed to JSON parse a line from worker stream".
AWX version
24.6.1
Select the relevant components
Installation method
docker development environment
Modifications
no
Ansible version
9
Operating system
Debian 12
Web browser
Firefox
Steps to reproduce
I don't know. It seems that the error appears on longer-running jobs more often. Sometimes it fails after 15 minutes sometimes after 150 minutes. The number of slices and forks makes no difference. Even with 1 fork and 1 slice it might fail. But it seems that the playbook must do some file lookups or delegate_to localhost to read some files.
Expected results
Job should succeed.
Actual results
Job fails with "file not found" error.
Additional information
I first suspected an issue with ansible-runner, because it is executed with the
--delete
flag and that it for some reason cleans up the private-data-dir. So I tried it with ansible-runner=2.3.6 on the exec nodes, but that doesn't change anything.Then I downgraded receptor to 1.4.1 on the exec nodes, but that doesn't make a difference, too.
The "file not found" error and that there is a defunct ansible-playbook process are the only hints I have so far. I don't see anything else in the logs.
The text was updated successfully, but these errors were encountered: