You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the waitress project we use coverage along with pytest-cov to compute coverage on all runs. Most recently we received a new contribution that fired CI across the test matrix, which included hanging in tests/test_functional.py. These tests spin up a server (with threads) using multiprocessing.
The developer who was adding new changes caught the issue and provided a stack trace when they hit Ctrl+C due to the test suite hanging:
platform linux -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
rootdir: .../projects/waitress
configfile: setup.cfg
testpaths: tests
plugins: cov-5.0.0
collected 796 items
tests/test_adjustments.py ................................................. [ 6%]
tests/test_buffers.py .................................................... [ 12%]
tests/test_channel.py ......................................................................................................................... [ 27%]
tests/test_functional.py ...................................................................................^CTraceback (most recent call last):
File ".../projects/waitress/src/waitress/server.py", line 325, in run
self.asyncore.loop(
File ".../projects/waitress/src/waitress/wasyncore.py", line 245, in loop
poll_fun(timeout, map)
File ".../projects/waitress/src/waitress/wasyncore.py", line 183, in poll
read(obj)
File ".../projects/waitress/src/waitress/wasyncore.py", line 104, in read
obj.handle_read_event()
File ".../projects/waitress/src/waitress/wasyncore.py", line 466, in handle_read_event
self.handle_read()
File ".../projects/waitress/src/waitress/channel.py", line 156, in handle_read
data = self.recv(self.adj.recv_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../projects/waitress/src/waitress/wasyncore.py", line 409, in recv
def recv(self, buffer_size):
File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
self.data_lock.acquire()
File ".../projects/waitress/tests/test_functional.py", line 43, in sigterm
sys.exit(0)
SystemExit: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../projects/waitress/tests/test_functional.py", line 33, in start_server
svr(app, queue, **kwargs).run()
File ".../projects/waitress/src/waitress/server.py", line 331, in run
self.task_dispatcher.shutdown()
File ".../projects/waitress/src/waitress/task.py", line 118, in shutdown
def shutdown(self, cancel_pending=True, timeout=5):
File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
self.data_lock.acquire()
KeyboardInterrupt
This is how it looks in CI, until it times out:
I myself develop on macOS (M1 MacBook Pro) and have not been able to reproduce the issue at all locally. Turning coverage off in CI runs made the issue go away, so I did some testing:
I started by downgrading to 7.5.4 - hung
Downgraded to 7.4.4 and did not hang
Then slowly worked myself back up to newest version that works which is 7.5.3.
Shows the various MR's and contains the action runs so you can view them.
To Reproduce
How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:
What version of Python are you using?
Python 3.9
Python 3.10
Python 3.11
Python 3.12
Python 3.13
What version of coverage.py shows the problem? The output of coverage debug sys is helpful.
7.6.5
7.5.5
What versions of what packages do you have installed? The output of pip freeze is helpful.
coverage==7.6.5
iniconfig==2.0.0
packaging==24.2
pip==24.3.1
pluggy==1.5.0
pytest==8.3.3
pytest-cov==6.0.0
What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix.
What commands should we run to reproduce the problem? Be specific. Include everything, even git clone, pip install, and so on. Explain like we're five!
This is a race condition, it may or may not happen. I have been unable to reproduce it outside of CI/CD. Seems to happen fairly often, rerunning jobs will usually allow them to succeed.
Expected behavior
No deadlock/hang while running the test suite with newer versions of coverage.
Additional context
This is a race condition. I'm sorry, I haven't been able to reproduce it at all locally so I can't provide anymore data or debug information.
The text was updated successfully, but these errors were encountered:
I think this may be related to a workaround that we had in the tests to make sure coverage would write output:
def try_register_coverage(): # pragma: no cover
# Hack around multiprocessing exiting early and not triggering coverage's
# atexit handler by always registering a signal handler
if "COVERAGE_PROCESS_START" in os.environ:
def sigterm(*args):
sys.exit(0)
signal.signal(signal.SIGTERM, sigterm)
This was originally added for coverage version 5.x.
Removing this works to fix the hang. My guess is that the order that the signal handlers are being run in is a random order, hence the inability to easily reproduce this issue.
While it solves my issue, and thus I would be fine with it being closed, if someone does register a signal handler wouldn't this race condition still potentially exist causing coverage's attempt to take a lock hang the process when it receives a SIGTERM?
Describe the bug
On the waitress project we use coverage along with pytest-cov to compute coverage on all runs. Most recently we received a new contribution that fired CI across the test matrix, which included hanging in
tests/test_functional.py
. These tests spin up a server (with threads) using multiprocessing.The developer who was adding new changes caught the issue and provided a stack trace when they hit Ctrl+C due to the test suite hanging:
Pylons/waitress#446 (comment)
Copied in its entirety here:
This is how it looks in CI, until it times out:
I myself develop on macOS (M1 MacBook Pro) and have not been able to reproduce the issue at all locally. Turning coverage off in CI runs made the issue go away, so I did some testing:
Pylons/waitress#454
Shows the various MR's and contains the action runs so you can view them.
To Reproduce
How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:
coverage debug sys
is helpful.pip freeze
is helpful.main
on https://github.com/Pylons/waitressgit clone
,pip install
, and so on. Explain like we're five!This is a race condition, it may or may not happen. I have been unable to reproduce it outside of CI/CD. Seems to happen fairly often, rerunning jobs will usually allow them to succeed.
Expected behavior
No deadlock/hang while running the test suite with newer versions of coverage.
Additional context
This is a race condition. I'm sorry, I haven't been able to reproduce it at all locally so I can't provide anymore data or debug information.
The text was updated successfully, but these errors were encountered: