-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
web_ws - Can't close tcp socket when receiving a close message from client. #8184
Comments
Not sure what I did to strike through the OS but that is all correct. |
It does look a little odd, and seems a little obvious which makes me wonder why it hasn't been noticed before. Do you think you could write a test in a PR to reproduce it? Then we could also try your suggestion of removing that check and seeing if any other tests break, which would be a good indication if it's the wrong change to make. |
I think the code is expecting that the client will close the connection, but we can't always rely on that even if they say they're going to close it, they may not. |
yeah, I forked aiohttp last night. I'll work on a small replication. I did try removing the call. This results in the next read request in close() failing with timeout error and then the socket does close but with a .. uh.. 1006 message I think. I also started looking out how to pass message to close() so I could add a condition for the read statement. Didn't get very far but it was kind of late. |
https://datatracker.ietf.org/doc/html/rfc6455#section-7
So, this is what it is trying to achieve when close is initiated by us.
I think we just need to tweak that code so that the server does not wait to receive a close code (as it has already received one), but it should continue to close the transport. i.e. Rather than removing that check we want to add |
so something like this instead of deleting? I looked through the code and it seems like _close_code is always set but i'm not sure if its safe to assume it is? Default is None. if self._closing:
self._set_code_close_transport(self._close_code)
return True |
Not sure, I'd have to look more closely. If we get the test up first, then we can see how it works. |
Boiling this down has been harder then I expected. So far I'm not recreating the issue. More research and foul language will be required. |
I'm pretty convinced there is a bug here where we don't close the transport if the client holds it open forever |
@spikefishjohn Can you give #8200 a shot? I'm running it on my production HA systems without any unexpected side effects I'll come up with a test for it if it fixes your issue |
@bdraco yeah i'll give it a test. I've been trying to reproduce the issue with a smaller code base but haven't been able to so far which has been pretty frustrating. I'll add this in today and see if it fixes the issue. If not i'll add a pdb trace of the accept to show what is happening if that helps. |
Ok that seems to fix the issue. Here is the client I'm using to talk to gns3. import asyncio
import base64
import json
import websockets
import logging
logger = logging.getLogger('websockets')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
log = logging.getLogger(__name__)
# GNS3 WebSocket server violates RFC 6455 so we have to be active closer
# Lets give Websockets a chance to get data.
WS_CLOSE_TIMEOUT = 10
RECONNECT_TIMEOUT = 1.618
CONTROLLER_WS_API = '/v2/notifications/ws'
COMPUTE_WS = '/v2/notifications/ws'
SERVER = '127.0.0.1:3080'
USER = 'XXX'
PASS = 'XXX'
CREDS = f'{USER}:{PASS}'
ENCODED_CREDS = base64.b64encode(CREDS.encode()).decode()
CONTROLLER_URI = f'ws://{SERVER}{CONTROLLER_WS_API}'
COMPUTE_URI = f'ws://{SERVER}{COMPUTE_WS}'
async def main() -> None:
async with asyncio.TaskGroup() as tasks:
tasks.create_task(websocket_logger(CONTROLLER_URI))
async def websocket_logger(endpoint: str) -> None:
headers = {
'Authorization': f'Basic {ENCODED_CREDS}'
}
try:
async with websockets.connect(endpoint, close_timeout=WS_CLOSE_TIMEOUT, extra_headers=headers) as websocket:
print("Call close")
await websocket.close()
print("close complete")
except ConnectionRefusedError:
log.info(f'Connection to {endpoint!r} refused.')
await asyncio.sleep(RECONNECT_TIMEOUT)
if __name__ == '__main__':
asyncio.run(main()) This is what the client now reports.
I'll pass this on the original poster of the bug and have them test as well. |
Thanks. Please keep us updated. |
FYI the bug reporter indicated they won't be able to test for a week or so. |
@spikefishjohn Was the reporter able to test the linked PR? Thanks |
@bdraco I'm asking for an update. |
Thanks! |
@bdraco The original poster of the GNS3 issues has indicated they will not be able to test this and asked that the original GNS3 bug be closed. Your patch fixed the client I made above for GNS3. I think that is as much as a reply as this will get. That being said if you think there is something else I can do to help by all means let me know. |
Describe the bug
I'm working on a GNS3/gns3-server#2320 for GNS3 that I think PR7978 lines up with. The GNS3 issue is that when a web socket client sends a close message the server never closes the tcp socket. I tried the patch above but it doesn't address the issue.
High level GNS3 is calling ws.receive() at which point aiohttp receive() the close message.
I believe the problem is because ws.receive is setting self._closing = True when it reiceves a close message. This then causes self.close() to return here which prevents if msg.type == WSMsgType.CLOSE: from being reached.
I'm almost thinking this should be removed but i'm not sure what the intent of that is so i'm unsure if that is the proper fix.
To Reproduce
I don't have a great way to reproduce this. I'm currently just using a fully installed GNS3 instance. I can work on making a reproduction now that I understand the issue.
Expected behavior
aiohttp.web_ws should close the socket when it receives a close message.
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
yarl Version
Related component
Server
Additional context
No response
Code of Conduct
EDIT: Update PR request in description.
The text was updated successfully, but these errors were encountered: