Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught socket exception during timeout handling. #210

Open
1 of 3 tasks
maximelb opened this issue Jul 3, 2019 · 9 comments Β· May be fixed by #310
Open
1 of 3 tasks

Uncaught socket exception during timeout handling. #210

maximelb opened this issue Jul 3, 2019 · 9 comments Β· May be fixed by #310
Labels
bug Something is broken help wanted Somebody help us, please! reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR

Comments

@maximelb
Copy link

maximelb commented Jul 3, 2019

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?
If a client goes away while a request timeout is being processed, the socket.error exception produced in /cheroot/server.py _conditional_error() bubbles up uncaught all the way to the thread worker. This generates a stack trace regardless of logging config.

❓ What is the motivation / use case for changing the behavior?
This is producing STDERR errors in our container log, preventing us from handling errors properly. I also suspect given enough occurrence it may lead to thread pool starvation although I have not experimented with it.

πŸ’‘ To Reproduce
This is hard to reproduce, I suspect timing is heavily involved. We are seeing it occur in our environment when requests timeout. These requests are coming from a load balancer.

πŸ’‘ Expected behavior
The exception should be caught and treated as a simple failure and simply trigger completion and cleanup of the client's request (the client simply has gone away).

πŸ“‹ Details
Here is the traceback we're seeing:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/site-packages/cheroot/workers/threadpool.py", line 114, in run
    conn.communicate()
  File "/usr/local/lib/python2.7/site-packages/cheroot/server.py", line 1286, in communicate
    self._conditional_error(req, '408 Request Timeout')
  File "/usr/local/lib/python2.7/site-packages/cheroot/server.py", line 1334, in _conditional_error
    req.simple_response(response)
  File "/usr/local/lib/python2.7/site-packages/cheroot/server.py", line 1115, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "/usr/local/lib/python2.7/site-packages/cheroot/makefile.py", line 78, in write
    data_mv[bytes_sent:bytes_sent + SOCK_WRITE_BLOCKSIZE],
  File "/usr/local/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 144, in send
    *args, **kwargs
  File "/usr/local/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 91, in _safe_call
    raise socket.error(errnum)
error: -1

πŸ“‹ Environment

  • PyOpenSSL version: 19.0.0
  • Cheroot version: 6.5.4
  • CherryPy version: 17.4.2
  • Python version: 2.7.13
  • OS: Linux Alpine

πŸ“‹ Additional context
I believe a simple try/except/pass for the socket.error in the context of https://github.com/cherrypy/cheroot/blob/v6.5.5/cheroot/server.py#L1276 should fix the issue without any negative side-effects since the server is already dealing with a client that has gone away.

@webknjaz
Copy link
Member

webknjaz commented Jul 3, 2019

You say that you use Python 3.7.3, but your traceback mentions Python 2.7. Where's the truth?

@webknjaz webknjaz added the reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR label Jul 3, 2019
@triage-new-issues triage-new-issues bot removed the triage label Jul 3, 2019
@maximelb
Copy link
Author

maximelb commented Jul 3, 2019

Ah yes my bad, I confused it with a different container, it is Python 2.7 latest version.

@webknjaz
Copy link
Member

webknjaz commented Jul 4, 2019

@maximelb this also means that you probably use CherryPy v17.

@webknjaz
Copy link
Member

webknjaz commented Jul 4, 2019

Please update the issue description with proper details.

@webknjaz webknjaz added bug Something is broken help wanted Somebody help us, please! labels Jul 4, 2019
@maximelb
Copy link
Author

maximelb commented Jul 4, 2019

I'm building straight from pip install, so it's going to be the latest version. Ultimately I think the version of cherrypy has little impact as the whole trace is within cheroot. Updating the original report.

@webknjaz
Copy link
Member

webknjaz commented Jul 4, 2019

Yes, you're probably right, but I don't like having misleading data in the report because it can confuse the future readers

@webknjaz
Copy link
Member

webknjaz commented Jul 4, 2019

I think we need to log the original error message + trace before https://github.com/cherrypy/cheroot/blob/v6.5.5/cheroot/ssl/pyopenssl.py#L91 to figure out what's going on and maybe process that one case only.

P.S. What version of pyOpenSSL do you use?

@maximelb
Copy link
Author

maximelb commented Jul 4, 2019

pyOpenSSL would also be latest pip, looks like 19.0.0.

@ajyoung
Copy link

ajyoung commented Aug 4, 2020

I've reproduced this issue with a simple test case on cheroot 8.4.2 with pyOpenSSL 19.1.0. (You will need to replace ssl-cert.pem and ssl-cert-pk.pem below with your own certs, which can be self-signed). Let me know if I should open a separate issue rather than bumping this one.

Simple server-side code:

[root@host-4 ~]vi sslperftest-simple-pyopenssl.py py

#!/usr/bin/python2
certfile='/tmp/ssl-cert.pem'
keyfile='/tmp/ssl-cert-pk.pem'
cafile=None
ciphers=None

host='host-4.example.com'
port=11827

def raw_wsgi_app(environ, start_response):
    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status, response_headers)
    return ['Hello world!']

from cheroot.ssl.pyopenssl import pyOpenSSLAdapter
from cheroot import wsgi

bind_addr = (host, port)
server = wsgi.Server(bind_addr, raw_wsgi_app, request_queue_size=32)
server.ssl_adapter = pyOpenSSLAdapter(certfile, keyfile, certificate_chain=cafile, ciphers=ciphers)

try:
    server.start()
except KeyboardInterrupt:
    server.stop()

To reproduce:

  1. Open two terminals on the same machine. Run the server-side code above in the first terminal.
  2. Run the client-side steps below in the second terminal.

Client-side steps to demonstrate problem:

  1. Run netstat to determine the process running the server:
[root@host-4 ~]# netstat -pant | grep 11827
tcp        0      0 172.27.176.10:11827     0.0.0.0:*               LISTEN      3299896/python
  1. Verify the number of threads in the threadpool of the server process:
[root@host-4 ~]# ps -Lmf 3299896 | wc -l
13
  1. Make a TCP connection to the server with no content that lasts longer than the server timeout:
[root@host-4 ~]# timeout 11 telnet 172.27.176.10 11827
Trying 172.27.176.10...
Connected to 172.27.176.10.
Escape character is '^]'.
  1. Verify that the thread pool dropped the thread:
[root@host-4 ~]# ps -Lmf 3299896 | wc -l
12
  1. Also, verify on the server-side console output that an error trace was written for the dropped thread:
[root@host-4 ~]# /opt/cloudera/parcels/KEYTRUSTEE_SERVER/bin/python sslperftest-simple-pyopenssl.py
Exception in thread CP Server Thread-1:
Traceback (most recent call last):
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/workers/threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1290, in communicate
    self._conditional_error(req, '408 Request Timeout')
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1339, in _conditional_error
    req.simple_response(response)
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1111, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/makefile.py", line 78, in write
    data_mv[bytes_sent:bytes_sent + SOCK_WRITE_BLOCKSIZE],
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 163, in send
    *args, **kwargs
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 110, in _safe_call
    raise socket.error(errnum)
error: -1
  1. Repeat steps as many times as you like. If you repeat until all of the threads in the thread pool are dropped, the server will become unresponsive.

@ajyoung ajyoung linked a pull request Aug 4, 2020 that will close this issue
15 tasks
@webknjaz webknjaz added reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR and removed reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR labels Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken help wanted Somebody help us, please! reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants