[undertow-dev] Too many open files: Exception accepting request, closing server channel TCP server (NIO)

Jonas Zeiger jonas.zeiger at talpidae.net
Wed Mar 4 03:38:16 EST 2020


If this issue is similar to the one we encountered on our machines, this 
helps to reproduce it (HTTP2, TLSv1.3 is optional):

https://gist.github.com/lyind/ae076548cafb2cb0b46a0819b749d6f4#file-curl-ssltest-sh

The critical steps are:

1. Attach debugger to your server process (normally via remote 
debugging)

2. On a client at least one layer 3 network segment away (behind one or 
more IP routers):
    start >= 6 requests in a loop (increase the chance of "vulnerable" 
TCP connections at step 3)

3. Sever the connection WITHOUT one of the two (client, server) 
operating system kernels being able to update TCP state:
    pull LAN cable of the client or shutdown router at the client-side

4. Repeat 2) and 3) until:
    1. $ netstat -tnp | grep java
       -> some connections in CLOSE_WAIT (then look how the server 
handles these, play with operating system tunables)
    2. (optional) XNIO threads stuck at 100% CPU
    3. (optional) unexpected exceptions occur


In theory one could also use some firewall or network emulator trickery 
to cut the connections but that would be harder to get right and still 
be quite complex for a unit or integration tests. Sounds more like a 
case for the physical network test lab.

On 2020-03-04 06:24, Flavia Rainone wrote:
> Can someone provide a reproducer for this error?
> 
> As for the old version of XNIO, it will be upgraded in Undertow
> 2.1.0.Final.
> 
> On Mon, Mar 2, 2020 at 8:47 PM Stuart Douglas <sdouglas at redhat.com>
> wrote:
> 
>> Hmm, maybe this is a bug in the HTTP/2 close code then, and somehow
>> the connection is not being closed if the client hangs up abruptly.
>> I had a quick look at the code though and I think it looks ok, but
>> maybe some more investigation is needed.
>> 
>> Stuart
>> 
>> On Tue, 3 Mar 2020 at 03:41, Nishant Kumar
>> <nishantkumar35 at gmail.com> wrote:
>> 
>> Yes, i have no control on client side. I am using HTTP2. I have
>> tried increasing open file limit to 400k but that consumes all
>> memory and system hangs. I will probably try to put a nginx in front
>> of Undertow and test.
>> 
>> setServerOption(UndertowOptions.ENABLE_HTTP2, true)
>> 
>> On Mon, Mar 2, 2020, 7:48 PM David Lloyd <david.lloyd at redhat.com>
>> wrote:
>> On Mon, Mar 2, 2020 at 7:56 AM Stan Rosenberg
>> <stan.rosenberg at acm.org> wrote:
>>> 
>>> Stuck in CLOSE_WAIT is a symptom of the client-side not properly
>> shutting down [1].
>> 
>> I would partially disagree.  In the article you linked: "It all
>> starts
>> with a listening application that leaks sockets and forgets to call
>> close(). This kind of bug does happen in complex applications."
>> This
>> seems to be essentially what's happening here: the server isn't
>> completing the connection (for some reason), stranding the socket in
>> `CLOSE_WAIT`.
>> 
>> We can't assume that the client is abandoning the connection after
>> `FIN_WAIT2` (the titular RFC violation); if the server stays in
>> `CLOSE_WAIT`, then even if the client dutifully stays in `FIN_WAIT2`
>> forever, the resolving condition still needs to be that the server
>> shuts down its side of the connection.
>> 
>> This diagram is a useful visual aid, mapping TCP states to the XNIO
>> API:
>> 
> https://www.lucidchart.com/publicSegments/view/524ec20a-5c40-4fd0-8bde-0a1c0a0046e1/image.png
>> 
>> --
>> - DML
>  _______________________________________________
> undertow-dev mailing list
> undertow-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/undertow-dev
> 
> --
> 
>  Flavia Rainone
> 
>  Principal Software Engineer
> 
>  Red Hat [1]
> 
>  frainone at redhat.com
> 
>  		 [1]
> 
> 
> 
> Links:
> ------
> [1] https://www.redhat.com
> _______________________________________________
> undertow-dev mailing list
> undertow-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/undertow-dev



More information about the undertow-dev mailing list