-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix websocket connection leak #7978
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7978 +/- ##
==========================================
+ Coverage 97.43% 97.44% +0.01%
==========================================
Files 107 107
Lines 32346 32370 +24
Branches 3748 3751 +3
==========================================
+ Hits 31516 31544 +28
+ Misses 629 624 -5
- Partials 201 202 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This reverts commit dd4c3b4.
pushed to production a new hours ago. all working well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the leak in that demo is gone, then lets merge this.
I am no longer able reproduce it with the provided script after this change. I will deploy the change more widely on my production later today and merge if no regressions are observed |
deployed widely. watching logs |
Logs look good. Home Assistant's object logger / leak checker doesn't see the leak anymore. |
Backport to 3.9: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 6f1c608 on top of patchback/backports/3.9/6f1c608fe756baa4b82ea8714373cebab0252265/pr-7978 Backporting merged PR #7978 into master
🤖 @patchback |
Backport to 3.10: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 6f1c608 on top of patchback/backports/3.10/6f1c608fe756baa4b82ea8714373cebab0252265/pr-7978 Backporting merged PR #7978 into master
🤖 @patchback |
(cherry picked from commit 6f1c608)
(cherry picked from commit 6f1c608)
@Dreamsorcerer any idea when 3.9.2 might be released.... presumably including this change? |
Will probably come round to it shortly, have been fully focused on a PR for aiohttp-admin for the last month. |
Howdy all and @bdraco! I'm working on a bug report for GNS3 that I think this PR lines up with. The GNS3 issue is that when a web socket client sends a close message the server never closes the tcp socket. I tried the patch above but it doesn't address the issue. High level GNS3 is calling ws.receive() at which point aiohttp receive() the close message. I believe the problem is because ws.receive is setting self._closing = True when it reiceves a close message. This then causes self.close() to return here which prevents if msg.type == WSMsgType.CLOSE: from being reached. I'm almost thinking this should be removed but i'm not sure what the intent of that is so i'm unsure if that is the proper fix. Should I open a new issue for this? |
Yes please. Thank you |
Will do, thanks for the quick reply. |
Hi, With the changes for this bug, the behavior of asyncio.wait_for(ws.receive(), timeout=1) has changed. Before, this could be used to do a poll for new messages, but now hitting the timeout causes the websocket to be closed. This causes problems in our code, so we are pinning to 3.9.1.
|
Use |
We should probably look at this more closely. The TimeoutError must come from our timeout() call, so that behaviour is probably correct. But, a CancelledError could come from anywhere and maybe shouldn't close the connection. It may be that we need to check if the cancellation came from outside the task ( More information about this at: |
I think the cancellation handling is fine here, but closing the connection when the receive times out/is cancelled is what was not desired. #8251 will restore the behavior to keep the connection open if receive times out or is cancelled |
That's perfect. Thanks! |
Also, worth pointing out that receive has a |
Yeah, I noticed that when debugging this. This code was written awhile back, I think before that parameter existed. But changing to use the parameter didn't help with the close logic, so I just pinned the version for now. I'll try using the parameter when i do upgrade once this patch is released. |
fixes #6325
The existing implementation never closed the transport unless
heartbeat
was enabled (not default), the pong was not received, and.close()
had not been called before the pong was not received since it cancelled the timer.We now always close the transport after we get the
WSMsgType.CLOSE
message or an abnormal closure occurs instead of hopingasyncio
callsconnection_lost
which may not happen in a reasonable time frame (or ever?) for a variety of reasons deeper in the stack (network, bad client implementations, etc).Any time we are closing/setting
self._close_code
we must either callself.close()
or close the transport otherwise its possible to leak the connection.To facilitate that, and reduce the chance of future refactoring re-introducing the problem,
_set_code_close_transport
is added which sets the code and closes the transport.All other places
self._close_code
is set have been audited to ensure they also callself.close()
There is one exception ifself._autoclose
is set toFalse
and the client closes the connection, than its the responsibility of the caller to close the connection as otherwise it can leak. This is now documented.