Add hard abort on resource closure if any part of the stream remains open #853

GrafBlutwurst · 2023-04-19T10:10:49Z

Background of this MR is that we observed leaked sockets that eventually caused our application to crash due to exhausted file handles. Granted we use websockets a bit weirdly by having them only open a short time rather than long which probably caused this problem to show up in the first place. For more background information see: https://discord.com/channels/632277896739946517/632286375311573032/1091349652579827783

Sadly I was unable to minimize an example due to time constraint but some wireshark debugging showed the following.

This is a healthy package exchange where all sockets in both applications (server and client) are torn down and cleaned up properly:

This is an unhealthy one:

Note that in both cases on the websocket protocol layer, close frames are exchanged properly. However in some (yet unidentified, I suspect concurrency, as the only way i was able to reproduce this is by concurrently firing many requests) cirumstances the server does not respond with a TCP FIN/ACK.

With a blaze server this also leads to CLOSE_WAIT sockets on the server, ember seems to handles this properly as of 0.29-RC3 with the EoS fix.

With both servers the JDK client then remains in a half open state with the server side stream remaining open. This MR checks if any side remains open in the tear down of the resource and in such a case aborts the connection. It's a bit of a sledgehammer solution but i couldn't figure out why the TCP frames are missing and this is at least a bandaid.

The problem gets much worse if proxies are involved as the half-open state of the socket also causes the tunneling connection to remain open leading to an explosion in allocated socket that are only eventually somewhat cleaned up by JDK https internal pool handling.

This is possibly related to http4s/http4s#4798 though I'm entirely unsure.

…open

core/src/main/scala/org/http4s/jdkhttpclient/JdkWSClient.scala

GrafBlutwurst · 2023-04-26T08:00:26Z

core/src/main/scala/org/http4s/jdkhttpclient/JdkWSClient.scala

+                closedDef.tryGet.flatMap {
+                  case Some(_) => F.unit
+                  case None =>
+                    if (step < 10) F.sleep(100.millis) *> awaitClose(step + 1) else F.unit


I wonder what a sensible amount of time to wait here is? should this be configurable? overloaded apply with a duration parameter to retain backwards compat? In practice this alleviates the need for actually using abort quite a bit which is nice.

I'm a bit confused by this, is this equivalent to closedDef.get.timeout(1.second) ?

that's what I was looking for. For some reason it wasn't in my code completion 🤦 . thanks

Ah yeah, you probably just needed import cats.effect.syntax.all._ to get that :)

Sorry, another follow-up question to this: do we have to do this timeout? I am wondering if the JDK WebSocket implementation itself has an internal timeout, and we are just forgetting to listen for an error event or something. see the docs:

Unless the CompletableFuture returned from this method completes with IllegalArgumentException, or the method throws NullPointerException, the output will be closed.

If not already closed, the input remains open until a Close message received, or abort is invoked, or an error occurs.

https://docs.oracle.com/en/java/javase/17/docs/api/java.net.http/java/net/http/WebSocket.html#sendClose(int,java.lang.String)

Turns out in the presence of proxies some race conditions cause the proxy tunnel to linger even if both flags indicated complete closure of Websocket

GrafBlutwurst · 2023-04-27T07:51:37Z

core/src/main/scala/org/http4s/jdkhttpclient/JdkWSClient.scala

+                  case WSFrame.Binary(data, last) => webSocket.sendBinary(data.toByteBuffer, last)
+                  case WSFrame.Ping(data) => webSocket.sendPing(data.toByteBuffer)
+                  case WSFrame.Pong(data) => webSocket.sendPong(data.toByteBuffer)
+                  case WSFrame.Close(statusCode, reason) => webSocket.sendClose(statusCode, reason)


One thing I have seen is this line failing because the output was already closed. We don't send close frames manually so the only way this should be invoked is by connectHighLevel which mirrors the close frame. Maybe we should guard this with an if, checking if the output is still open? I don't like that much though because low level usage would then have implicit restrictions.

Also as a side note, the mirroring of the closure frame in connectHighLevel actually may fail if the server emits a close code that is not applicable for a client to send.

ybasket · 2024-02-13T09:27:26Z

@GrafBlutwurst Hey, sorry for the ping, but the problem this PR tackles came up again in #1015 (where I added a fix that is a simplified version of this PR). Are still planning on getting this change over the finish line? And if so, can I help somehow?

hamnis · 2024-11-07T17:05:42Z

I tried merging series/0.9 into this PR, but too many things have changed here.
Are you able to pick this up again @GrafBlutwurst ?

GrafBlutwurst · 2024-11-07T17:27:57Z

sadly i won't be able to get back to this pr in the forseeable future

Add hard abort on resource closure if any part of the stream remains …

a49143b

…open

mergify bot added the core label Apr 19, 2023

armanbilge reviewed Apr 19, 2023

View reviewed changes

core/src/main/scala/org/http4s/jdkhttpclient/JdkWSClient.scala Outdated Show resolved Hide resolved

GrafBlutwurst added 2 commits April 19, 2023 17:23

change to guarantee for socket release

36c74b2

Wait for reception of close frame, don't kill dispatcher immediately

7731d2d

GrafBlutwurst commented Apr 26, 2023

View reviewed changes

core/src/main/scala/org/http4s/jdkhttpclient/JdkWSClient.scala Outdated Show resolved Hide resolved

GrafBlutwurst commented Apr 26, 2023

View reviewed changes

Cleanup and unconditional abort

a009679

Turns out in the presence of proxies some race conditions cause the proxy tunnel to linger even if both flags indicated complete closure of Websocket

GrafBlutwurst commented Apr 27, 2023

View reviewed changes

ybasket mentioned this pull request Feb 3, 2024

Add Resource-based simple constructors #1015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hard abort on resource closure if any part of the stream remains open #853

Add hard abort on resource closure if any part of the stream remains open #853

GrafBlutwurst commented Apr 19, 2023 •

edited

Loading

GrafBlutwurst Apr 26, 2023

armanbilge Apr 26, 2023

GrafBlutwurst Apr 26, 2023

armanbilge Apr 27, 2023

GrafBlutwurst Apr 27, 2023

ybasket commented Feb 13, 2024

hamnis commented Nov 7, 2024

GrafBlutwurst commented Nov 7, 2024

Add hard abort on resource closure if any part of the stream remains open #853

Are you sure you want to change the base?

Add hard abort on resource closure if any part of the stream remains open #853

Conversation

GrafBlutwurst commented Apr 19, 2023 • edited Loading

GrafBlutwurst Apr 26, 2023

Choose a reason for hiding this comment

armanbilge Apr 26, 2023

Choose a reason for hiding this comment

GrafBlutwurst Apr 26, 2023

Choose a reason for hiding this comment

armanbilge Apr 27, 2023

Choose a reason for hiding this comment

GrafBlutwurst Apr 27, 2023

Choose a reason for hiding this comment

ybasket commented Feb 13, 2024

hamnis commented Nov 7, 2024

GrafBlutwurst commented Nov 7, 2024

GrafBlutwurst commented Apr 19, 2023 •

edited

Loading