Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Disable socket timeouts for internal request, especially for watching #144

Merged
merged 2 commits into from
Jul 12, 2019

Conversation

nolar
Copy link
Contributor

@nolar nolar commented Jul 11, 2019

Fix an issue with the operator dying in 10 seconds after starting — with socket.timeout and requests.exceptions.ConnectionError exceptions.

Issues: #110

Caused by the pykube's default of 10s for all requests, including the watch-requests (which are also GET). See hjacobs/pykube#32

Introduced to Kopf in #110. Specifically, the timeouts were discussed there, and converted from no-timeouts to default 10s.

This PR reverts it back to no timeouts for all internal requests. This topic was also discussed in few other Kopf's PRs — it should work even on the slow clusters, even over the slow network connections (e.g. in IDEs over network).

And, in case the socket timeout will be revised or made configurable in the future, disable it explicitly for the watch-requests, and use the server-side timeouts instead via the query params — same as the official k8s client library does. This requires hjacobs/pykube#33

The server-side timeouts (timeoutSeconds param) are part of the K8s API. E.g. for pod-watching: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#watch-64


Stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 397, in _error_catcher
    yield
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 704, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 636, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 527, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 732, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/dist-packages/urllib3/response.py", line 402, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.3.0.1', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/kopf", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 30, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 61, in run
    peering_name=peering_name,
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 271, in run
    task.result()  # can raise the regular (non-cancellation) exceptions.
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 78, in watcher
    async for event in watching.infinite_watch(resource=resource, namespace=namespace):
  File "/usr/local/lib/python3.7/dist-packages/kopf/clients/watching.py", line 131, in infinite_watch
    async for event in streaming_watch(resource=resource, namespace=namespace):
  File "/usr/local/lib/python3.7/dist-packages/kopf/clients/watching.py", line 93, in streaming_watch
    async for event in streaming_aiter(stream, loop=loop):
  File "/usr/local/lib/python3.7/dist-packages/kopf/clients/watching.py", line 62, in streaming_aiter
    yield await loop.run_in_executor(executor, streaming_next, src)
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/dist-packages/kopf/clients/watching.py", line 50, in streaming_next
    return next(src)
  File "/usr/local/lib/python3.7/dist-packages/kopf/clients/fetching.py", line 82, in <genexpr>
    return iter({'type': event.type, 'object': event.object.obj} for event in src)
  File "/usr/local/lib/python3.7/dist-packages/pykube/query.py", line 178, in object_stream
    for line in r.iter_lines():
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 794, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 757, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Read timed out.

@nolar nolar added the bug Something isn't working label Jul 11, 2019
@nolar nolar requested a review from samurang87 as a code owner July 11, 2019 19:21
@zincr
Copy link

zincr bot commented Jul 11, 2019

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

@zincr
Copy link

zincr bot commented Jul 11, 2019

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below


Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

  • ✅ Approved by PR author @nolar
  • ❌ 1 additional approval needed
     

@nolar nolar merged commit 17c9871 into zalando-incubator:master Jul 12, 2019
@nolar nolar deleted the pykube-timeouts branch July 12, 2019 11:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants