Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI with postgres #954

Closed
tcompa opened this issue Nov 10, 2023 · 12 comments · Fixed by #1012
Closed

Fix CI with postgres #954

tcompa opened this issue Nov 10, 2023 · 12 comments · Fixed by #1012
Labels
1.4.0 bug Something isn't working High Priority Current Priorities & Blocking Issues testing testing

Comments

@tcompa
Copy link
Collaborator

tcompa commented Nov 10, 2023

See failures in #953.

For the record, as of #949 we are using sqlalchemy 1.4.50 in the CI (while we were using 1.4.41 before that PR).

Ref:

@tcompa tcompa added bug Something isn't working testing testing labels Nov 10, 2023
@tcompa tcompa linked a pull request Nov 10, 2023 that will close this issue
1 task
@tcompa
Copy link
Collaborator Author

tcompa commented Nov 10, 2023

A sample traceback from https://pipelinesghubeus12.actions.githubusercontent.com/IMqMjycCLBE0AibOdlE9KqIzP2XzCrYHTi7wHEWdbQzDFyFwBI/_apis/pipelines/1/runs/4819/signedlogcontent/2?urlExpires=2023-11-10T14%3A04%3A11.1235309Z&urlSigningMethod=HMACV1&urlSignature=lTGd5iZHnmQYw11bV03I7lHRqKeYHvOM49MaFl6XwbE%3D


2023-11-10T13:51:23.2861997Z tests/test_alive.py .                                                    [  0%]
2023-11-10T13:51:27.9952067Z tests/test_backend_local.py .........                                    [  3%]
2023-11-10T13:51:30.4547390Z tests/test_backend_local_custom_task.py .                                [  3%]
2023-11-10T13:52:48.4764275Z tests/test_backend_runners.py ..                                         [  4%]
2023-11-10T13:53:19.0700901Z tests/test_backend_slurm.py .........                                    [  7%]
2023-11-10T13:53:30.7329445Z tests/test_backend_slurm_shutdown.py ...                                 [  8%]
2023-11-10T13:53:31.0413328Z tests/test_background_tasks.py ..                                        [  8%]
2023-11-10T13:54:12.3007006Z tests/test_commands.py ..........                                        [ 11%]
2023-11-10T13:54:32.5687820Z tests/test_dataset_api.py ...........                                    [ 15%]
2023-11-10T13:54:32.9849319Z tests/test_db.py ...                                                     [ 16%]
2023-11-10T13:54:34.1934238Z tests/test_db_project.py .                                               [ 16%]
2023-11-10T13:54:34.2185183Z tests/test_fixtures_settings_factory.py ..                               [ 17%]
2023-11-10T13:56:00.0667358Z tests/test_full_workflow.py .............                                [ 21%]
2023-11-10T13:56:11.3182802Z tests/test_history.py .....                                              [ 23%]
2023-11-10T13:56:25.1764134Z tests/test_job_api.py .....                                              [ 24%]
2023-11-10T13:56:26.7560823Z tests/test_link_project_user.py .                                        [ 25%]
2023-11-10T13:56:46.3710485Z tests/test_project_api.py ...........                                    [ 28%]
2023-11-10T13:57:06.9872547Z tests/test_project_apply_api.py .....                                    [ 30%]
2023-11-10T13:57:19.7657612Z tests/test_security.py ...............                                   [ 34%]
2023-11-10T13:57:33.8949644Z tests/test_task_api.py .............F                                    [ 39%]
2023-11-10T13:57:34.2997104Z tests/test_task_api.py:318 test_delete_task - pytest.PytestUnraisabl…
2023-11-10T13:57:34.2998460Z tests/test_task_api.py .                                                 [ 39%]
2023-11-10T13:58:51.4910273Z tests/test_task_collection_api.py .............                          [ 43%]
2023-11-10T13:59:26.9185982Z tests/test_task_collection_pypi.py ............                          [ 47%]
2023-11-10T13:59:34.5638360Z tests/test_unit_aux_functions.py .......                                 [ 50%]
2023-11-10T13:59:34.8118913Z tests/test_unit_config.py .................                              [ 55%]
2023-11-10T13:59:34.9210815Z tests/test_unit_coroutine_never_awaited.py .xx                           [ 56%]
2023-11-10T13:59:35.0621833Z tests/test_unit_custom_subprocess_module.py ..                           [ 57%]
2023-11-10T13:59:44.0211257Z tests/test_unit_db_models.py ........                                    [ 59%]
2023-11-10T13:59:44.5211876Z tests/test_unit_dummytasks.py .....                                      [ 61%]
2023-11-10T13:59:44.6294480Z tests/test_unit_fractal_thread_pool_executor.py ................         [ 66%]
2023-11-10T13:59:44.6555695Z tests/test_unit_injection.py ...                                         [ 67%]
2023-11-10T13:59:44.6587692Z tests/test_unit_json_schemas.py .                                        [ 67%]
2023-11-10T13:59:44.7827526Z tests/test_unit_local_config.py ..                                       [ 68%]
2023-11-10T13:59:45.7918136Z tests/test_unit_schemas.py .......................                       [ 75%]
2023-11-10T13:59:45.8420132Z tests/test_unit_set_start_and_last_task_index.py .......                 [ 77%]
2023-11-10T13:59:46.2536560Z tests/test_unit_slurm_batching.py ................................       [ 88%]
2023-11-10T13:59:46.8052709Z tests/test_unit_slurm_config.py ......                                   [ 90%]
2023-11-10T13:59:49.4175545Z tests/test_unit_submit_workflow.py ...                                   [ 91%]
2023-11-10T13:59:49.4712381Z tests/test_unit_utils.py ...                                             [ 91%]
2023-11-10T14:01:24.2363055Z tests/test_workflow_api.py .........................                     [100%]
2023-11-10T14:01:24.2363889Z 
2023-11-10T14:01:24.2364211Z =================================== FAILURES ===================================
2023-11-10T14:01:24.2365172Z _______________________________ test_delete_task _______________________________
2023-11-10T14:01:24.2366201Z 
2023-11-10T14:01:24.2366729Z cls = <class '_pytest.runner.CallInfo'>
2023-11-10T14:01:24.2367679Z func = <function call_runtest_hook.<locals>.<lambda> at 0x7f77dc144ca0>
2023-11-10T14:01:24.2368619Z when = 'call'
2023-11-10T14:01:24.2369410Z reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)
2023-11-10T14:01:24.2370131Z 
2023-11-10T14:01:24.2370300Z     @classmethod
2023-11-10T14:01:24.2370730Z     def from_call(
2023-11-10T14:01:24.2371166Z         cls,
2023-11-10T14:01:24.2371616Z         func: "Callable[[], TResult]",
2023-11-10T14:01:24.2372532Z         when: "Literal['collect', 'setup', 'call', 'teardown']",
2023-11-10T14:01:24.2373353Z         reraise: Optional[
2023-11-10T14:01:24.2374099Z             Union[Type[BaseException], Tuple[Type[BaseException], ...]]
2023-11-10T14:01:24.2374900Z         ] = None,
2023-11-10T14:01:24.2375453Z     ) -> "CallInfo[TResult]":
2023-11-10T14:01:24.2376107Z         """Call func, wrapping the result in a CallInfo.
2023-11-10T14:01:24.2376798Z     
2023-11-10T14:01:24.2377193Z         :param func:
2023-11-10T14:01:24.2378087Z             The function to call. Called without arguments.
2023-11-10T14:01:24.2378860Z         :param when:
2023-11-10T14:01:24.2379450Z             The phase in which the function is called.
2023-11-10T14:01:24.2380165Z         :param reraise:
2023-11-10T14:01:24.2380892Z             Exception or exceptions that shall propagate if raised by the
2023-11-10T14:01:24.2381855Z             function, instead of being wrapped in the CallInfo.
2023-11-10T14:01:24.2382574Z         """
2023-11-10T14:01:24.2384550Z         excinfo = None
2023-11-10T14:01:24.2385058Z         start = timing.time()
2023-11-10T14:01:24.2385680Z         precise_start = timing.perf_counter()
2023-11-10T14:01:24.2386325Z         try:
2023-11-10T14:01:24.2386812Z >           result: Optional[TResult] = func()
2023-11-10T14:01:24.2387596Z 
2023-11-10T14:01:24.2388749Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/_pytest/runner.py:341: 
2023-11-10T14:01:24.2390269Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-11-10T14:01:24.2392109Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/_pytest/runner.py:262: in <lambda>
2023-11-10T14:01:24.2393752Z     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
2023-11-10T14:01:24.2395543Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/pluggy/_hooks.py:433: in __call__
2023-11-10T14:01:24.2397269Z     return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
2023-11-10T14:01:24.2399203Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/pluggy/_manager.py:112: in _hookexec
2023-11-10T14:01:24.2400928Z     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
2023-11-10T14:01:24.2403061Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/_pytest/unraisableexception.py:88: in pytest_runtest_call
2023-11-10T14:01:24.2404824Z     yield from unraisable_exception_runtest_hook()
2023-11-10T14:01:24.2405700Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-11-10T14:01:24.2406337Z 
2023-11-10T14:01:24.2406922Z     def unraisable_exception_runtest_hook() -> Generator[None, None, None]:
2023-11-10T14:01:24.2407886Z         with catch_unraisable_exception() as cm:
2023-11-10T14:01:24.2408540Z             yield
2023-11-10T14:01:24.2408984Z             if cm.unraisable:
2023-11-10T14:01:24.2409618Z                 if cm.unraisable.err_msg is not None:
2023-11-10T14:01:24.2410367Z                     err_msg = cm.unraisable.err_msg
2023-11-10T14:01:24.2411029Z                 else:
2023-11-10T14:01:24.2411565Z                     err_msg = "Exception ignored in"
2023-11-10T14:01:24.2412426Z                 msg = f"{err_msg}: {cm.unraisable.object!r}\n\n"
2023-11-10T14:01:24.2413154Z                 msg += "".join(
2023-11-10T14:01:24.2413752Z                     traceback.format_exception(
2023-11-10T14:01:24.2414448Z                         cm.unraisable.exc_type,
2023-11-10T14:01:24.2415144Z                         cm.unraisable.exc_value,
2023-11-10T14:01:24.2415864Z                         cm.unraisable.exc_traceback,
2023-11-10T14:01:24.2416515Z                     )
2023-11-10T14:01:24.2416947Z                 )
2023-11-10T14:01:24.2417647Z >               warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
2023-11-10T14:01:24.2419464Z E               pytest.PytestUnraisableExceptionWarning: Exception ignored in: <function Connection.__del__ at 0x7f77df2000d0>
2023-11-10T14:01:24.2420743Z E               
2023-11-10T14:01:24.2421255Z E               Traceback (most recent call last):
2023-11-10T14:01:24.2423278Z E                 File "/home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/asyncpg/connection.py", line 132, in __del__
2023-11-10T14:01:24.2424980Z E                   self.terminate()
2023-11-10T14:01:24.2426899Z E                 File "/home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/asyncpg/connection.py", line 1344, in terminate
2023-11-10T14:01:24.2428728Z E                   self._abort()
2023-11-10T14:01:24.2430588Z E                 File "/home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/asyncpg/connection.py", line 1371, in _abort
2023-11-10T14:01:24.2432312Z E                   self._protocol.abort()
2023-11-10T14:01:24.2433808Z E                 File "asyncpg/protocol/protocol.pyx", line 571, in asyncpg.protocol.protocol.BaseProtocol.abort
2023-11-10T14:01:24.2435881Z E                 File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/selector_events.py", line 686, in abort
2023-11-10T14:01:24.2437177Z E                   self._force_close(None)
2023-11-10T14:01:24.2438521Z E                 File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
2023-11-10T14:01:24.2440000Z E                   self._loop.call_soon(self._call_connection_lost, exc)
2023-11-10T14:01:24.2441486Z E                 File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/base_events.py", line 755, in call_soon
2023-11-10T14:01:24.2442756Z E                   self._check_thread()
2023-11-10T14:01:24.2444041Z E                 File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/base_events.py", line 792, in _check_thread
2023-11-10T14:01:24.2445366Z E                   raise RuntimeError(
2023-11-10T14:01:24.2446650Z E               RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
2023-11-10T14:01:24.2447575Z 
2023-11-10T14:01:24.2449139Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning
2023-11-10T14:01:24.2451204Z ----------------------------- Captured stderr call -----------------------------
2023-11-10T14:01:24.2459022Z WARNING:asyncio:Executing <Task pending name='Task-2513' coro=<_create_ssl_connection() running at /home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/asyncpg/connect_utils.py:684> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()] created at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/base_events.py:429> cb=[_release_waiter(<Future pendi...events.py:429>)() at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/tasks.py:387] created at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/tasks.py:636> took 0.171 seconds
2023-11-10T14:01:24.2465962Z ------------------------------ Captured log call -------------------------------
2023-11-10T14:01:24.2473748Z WARNING  asyncio:base_events.py:1904 Executing <Task pending name='Task-2513' coro=<_create_ssl_connection() running at /home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/asyncpg/connect_utils.py:684> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()] created at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/base_events.py:429> cb=[_release_waiter(<Future pendi...events.py:429>)() at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/tasks.py:387] created at /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/asyncio/tasks.py:636> took 0.171 seconds
2023-11-10T14:01:24.2480631Z =============================== warnings summary ===============================
2023-11-10T14:01:24.2482205Z ../../../.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/passlib/pwd.py:16
2023-11-10T14:01:24.2485608Z   /home/runner/.cache/pypoetry/virtualenvs/fractal-server-21eqDTaF-py3.10/lib/python3.10/site-packages/passlib/pwd.py:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
2023-11-10T14:01:24.2488043Z     import pkg_resources
2023-11-10T14:01:24.2488371Z 
2023-11-10T14:01:24.2488803Z tests/test_unit_coroutine_never_awaited.py::test_harmless_warning
2023-11-10T14:01:24.2490832Z   /home/runner/work/fractal-server/fractal-server/tests/test_unit_coroutine_never_awaited.py:12: UserWarning: This is a dummy harmless warning, to be ignored
2023-11-10T14:01:24.2493071Z     warnings.warn("This is a dummy harmless warning, to be ignored")
2023-11-10T14:01:24.2493736Z 
2023-11-10T14:01:24.2494312Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2023-11-10T14:01:24.2495587Z                               Summary of Failures                               
2023-11-10T14:01:24.2496765Z ┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
2023-11-10T14:01:24.2498051Z ┃               ┃               ┃  Function     ┃              ┃               ┃
2023-11-10T14:01:24.2499176Z ┃  File         ┃  Function     ┃  Line         ┃  Error Line  ┃  Error        ┃
2023-11-10T14:01:24.2500366Z ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
2023-11-10T14:01:24.2501693Z │  tests/test…  │  test_delet…  │  319          │  341         │  PytestUnra…  │
2023-11-10T14:01:24.2502940Z └───────────────┴───────────────┴───────────────┴──────────────┴───────────────┘
2023-11-10T14:01:24.2503776Z Results (607.09s):
2023-11-10T14:01:24.2523323Z          1 failed
2023-11-10T14:01:24.2530786Z        309 passed
2023-11-10T14:01:24.2538503Z          2 xfailed
2023-11-10T14:01:24.2545379Z          2 warnings
2023-11-10T14:01:25.3288611Z ##[error]Process completed with exit code 1.

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 14, 2023

Cannot reproduce any more.

@tcompa tcompa reopened this Nov 15, 2023
@tcompa tcompa added High Priority Current Priorities & Blocking Issues 1.4.0 labels Nov 15, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Nov 21, 2023

Perhaps fixed with 1008? To be verified.

EDIT: no

This was referenced Nov 21, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Nov 21, 2023

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 23, 2023

Unexpectedly, this bug appeared again (ref https://github.com/fractal-analytics-platform/fractal-server/actions/runs/6966759281/job/18957429979), even after #1012.

One more reason to consider #996

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 28, 2023

Possibly relevant entry in sqlalchemy 1.4.41 CHANGELOG (https://docs.sqlalchemy.org/en/20/changelog/changelog_14.html#change-1.4.41-asyncio):

asyncio

[asyncio] [bug]

Integrated support for asyncpg’s terminate() method call for cases where the connection pool is recycling a possibly timed-out connection, where a connection is being garbage collected that wasn’t gracefully closed, as well as when the connection has been invalidated. This allows asyncpg to abandon the connection without waiting for a response that may incur long timeouts.

References: [#8419](https://www.sqlalchemy.org/trac/ticket/8419)

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 29, 2023

(with @mfranzon)

Ref:

where we upgraded asyncpg from 0.27 to 0.29. Possibly related (as of 0.29): MagicStack/asyncpg#1087 and the addition of .


Errors still appear when running the CI locally (see below), but they are not re-raised and the tests pass.


Note the difference between close and terminate methods, where the terminate is supposed to be more "forgiving" (log the error and move on) - I cannot find the original source, but it started from these:
sqlalchemy/sqlalchemy#8419
sqlalchemy/sqlalchemy#8418
sqlalchemy/sqlalchemy#8145

Another relevant bit of info is

not sure if you are doing this but you need to set pool_recycle to a value less than that timeout. so your application would not be exposed to connections that have been shut off from the server side.


poetry run pytest -s --log-cli-level info tests/test_*_api.py

...

Results (78.27s):
        95 passed
         1 warning
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa737542080>
transport: <_SelectorSocketTransport closing fd=18>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa7378a51b0>
transport: <_SelectorSocketTransport closing fd=8>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa746081cf0>
transport: <_SelectorSocketTransport closing fd=9>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa737d1fe50>
transport: <_SelectorSocketTransport closing fd=10>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa737f59f60>
transport: <_SelectorSocketTransport closing fd=11>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 29, 2023

A very promising option to play with is pre_ping, as described in the "pessimistic" approach to handling disconnections in sqlalchemy: https://docs.sqlalchemy.org/en/14/core/pooling.html#disconnect-handling-pessimistic.

We ran the tests/test_*api.py subset, with these results

  1. asyncpg 0.27, sqlalchemy 1.4.50, pre_ping=False
    -> CI passes, with logs of RuntimeErrror (as in Fix CI with postgres #954 (comment))
  2. asyncpg 0.27, sqlalchemy 1.4.50, pre_ping=True
    -> CI passes, with no error logs
  3. asyncpg 0.29, sqlalchemy 1.4.50, pre_ping=False
    -> CI passes, with logs of RuntimeErrror (as in Fix CI with postgres #954 (comment))
  4. asyncpg 0.29, sqlalchemy 1.4.50, pre_ping=True
    -> CI passes, with no error logs

This suggests that pre_ping=True is a definitely valid option. We are still missing something about what is triggering the error.

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 29, 2023

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 30, 2023

As part of sqlalchemy 2.0.2, there is this change (https://docs.sqlalchemy.org/en/20/changelog/changelog_20.html#change-72bab6f6b9eb1662c0f5c5030a44f184):

Repaired a regression caused by the fix for #8419 which caused asyncpg connections to be reset (i.e. transaction rollback() called) and returned to the pool normally in the case that the connection were not explicitly returned to the connection pool and was instead being intercepted by Python garbage collection, which would fail if the garbage collection operation were being called outside of the asyncio event loop, leading to a large amount of stack trace activity dumped into logging and standard output.

The correct behavior is restored, which is that all asyncio connections that are garbage collected due to not being explicitly returned to the connection pool are detached from the pool and discarded, along with a warning, rather than being returned the pool, as they cannot be reliably reset. In the case of asyncpg connections, the asyncpg-specific terminate() method will be used to end the connection more gracefully within this process as opposed to just dropping it.

This change includes a small behavioral change that is hoped to be useful for debugging asyncio applications, where the warning that’s emitted in the case of asyncio connections being unexpectedly garbage collected has been made slightly more aggressive by moving it outside of a try/except block and into a finally: block, where it will emit unconditionally regardless of whether the detach/termination operation succeeded or not. It will also have the effect that applications or test suites which promote Python warnings to exceptions will see this as a full exception raise, whereas previously it was not possible for this warning to actually propagate as an exception. Applications and test suites which need to tolerate this warning in the interim should adjust the Python warnings filter to allow these warnings to not raise.

The behavior for traditional sync connections remains unchanged, that garbage collected connections continue to be returned to the pool normally without emitting a warning. This will likely be changed in a future major release to at least emit a similar warning as is emitted for asyncio drivers, as it is a usage error for pooled connections to be intercepted by garbage collection without being properly returned to the pool.

See also the corresponding issue/discussion/commit:

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 30, 2023

In #1044, by setting event pool _debug to False, we avoid the actual error raised in the CI.

The background error logs (as shown in #954 (comment)) are still present, but they are less worrying.

For the record, even if they are not always reproducible, some tests suggest that:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.4.0 bug Something isn't working High Priority Current Priorities & Blocking Issues testing testing
Projects
None yet
1 participant