[undertow-dev] Too many open files: Exception accepting request, closing server channel TCP server (NIO)

Jonas Zeiger jonas.zeiger at talpidae.net
Tue Mar 3 05:03:31 EST 2020


Hi,

I also encountered the "too many TCP connections in CLOSE_WAIT" issue 
with multiple web servers (also Undertow).

We see this happening with mobile clients a lot as they often loose 
connection in the most weird moments (railway tunnel, person walks into 
building, ...).

As far as I'm aware this is an issue with TCP in general, when the 
remote client just goes away (network route severed/power lost, not just 
process terminated).

Thus a workaround is best applied on the layer actually handling the TCP 
connection state. This is OSI layer 4 and handled by the operating 
system kernel network stack.

On Linux I successfully use the following sysctl.conf entries to keep 
the CLOSE_WAIT connections in check and avoid netfilter issues:


# keep the number of TCP connections in CLOSE_WAIT low
# by killing CLOSE_WAIT sockets after some time (will look like RESET to 
the server processes)
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
# tune TCP keepalive to be a little more practical (2h -> 5 minutes 
timeout, kill connection after 2 failed probes)
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 300

# more contrack table entries (else we may get some "upstream connection 
timed out"
# only needed when using netfilter connection tracking (for example NAT)
# net.nf_conntrack_max = 6556000


Undertow seems to work OK and behaves as an application on layer 7 
should. There are open issues when using TLSv1.3 (100% CPU XNIO I/O 
threads), but that seems off-topic here.

note: The inherent issues of TCP will likely go away when HTTP3 is in 
wide use, as it replaces layer 4 TCP with layer 4+ QUIC.


On 2020-03-03 10:24, Nishant Kumar wrote:
> I tried disabling HTTP/2 but still too many CLOSE_WAIT connections. I
> tried putting Nginx in front of the Java server but seems the same
> issue with it. It seems Nginx is also creating the same number of
> connections with Java.
> 
> server = Undertow.builder()
>           .addHttpListener(SERVER_LISTEN_PORT, SERVER_HOST)
>           .addHttpsListener(SERVER_SSL_LISTEN_PORT, SERVER_HOST,
> sslContext)
>           .setWorkerThreads(WORKER_THREAD)
>           .SETSERVEROPTION(UNDERTOWOPTIONS.ENABLE_HTTP2, FALSE)
>           .setServerOption(UndertowOptions.IDLE_TIMEOUT, 150000) //
> 150s
>           .setServerOption(UndertowOptions.NO_REQUEST_TIMEOUT, 150000)
> // 150s
> 
> .setServerOption(org.xnio.Options.SSL_SERVER_SESSION_CACHE_SIZE, 1024
> * 20) // 20000 sessions
> 
> .setServerOption(org.xnio.Options.SSL_SERVER_SESSION_TIMEOUT, 1500) //
> 150s
>           .setIoThreads(IO_THREAD)
>           .setWorkerOption(org.xnio.Options.TCP_NODELAY, true)
>           .setSocketOption(org.xnio.Options.TCP_NODELAY, true)
>           .setSocketOption(org.xnio.Options.KEEP_ALIVE, true)
>           .setSocketOption(org.xnio.Options.REUSE_ADDRESSES, true)
>           .setSocketOption(org.xnio.Options.CONNECTION_HIGH_WATER,
> 100000)
>           .setSocketOption(org.xnio.Options.CONNECTION_LOW_WATER,
> 100000)
>           .setHandler(Handlers.routing().post("/", new
> RequestHandler(appContext)))
>           .build();
> 
> #  netstat -nalp  | grep -E ":80 |:443 " | awk '{split($4,a,":");print
> a[2] " " $6}'| sort | uniq -c
>   85918 443 CLOSE_WAIT
>   10279 443 ESTABLISHED
>      67 443 LAST_ACK
>     152 443 SYN_RECV
>     505 443 TIME_WAIT
>   31151 80 CLOSE_WAIT
>    3747 80 ESTABLISHED
>     108 80 LAST_ACK
>     146 80 SYN_RECV
>       2  LISTEN
> 
> On Tue, Mar 3, 2020 at 5:17 AM Stuart Douglas <sdouglas at redhat.com>
> wrote:
> 
>> Hmm, maybe this is a bug in the HTTP/2 close code then, and somehow
>> the connection is not being closed if the client hangs up abruptly.
>> I had a quick look at the code though and I think it looks ok, but
>> maybe some more investigation is needed.
>> 
>> Stuart
>> 
>> On Tue, 3 Mar 2020 at 03:41, Nishant Kumar
>> <nishantkumar35 at gmail.com> wrote:
>> 
>> Yes, i have no control on client side. I am using HTTP2. I have
>> tried increasing open file limit to 400k but that consumes all
>> memory and system hangs. I will probably try to put a nginx in front
>> of Undertow and test.
>> 
>> setServerOption(UndertowOptions.ENABLE_HTTP2, true)
>> 
>> On Mon, Mar 2, 2020, 7:48 PM David Lloyd <david.lloyd at redhat.com>
>> wrote:
>> On Mon, Mar 2, 2020 at 7:56 AM Stan Rosenberg
>> <stan.rosenberg at acm.org> wrote:
>>> 
>>> Stuck in CLOSE_WAIT is a symptom of the client-side not properly
>> shutting down [1].
>> 
>> I would partially disagree.  In the article you linked: "It all
>> starts
>> with a listening application that leaks sockets and forgets to call
>> close(). This kind of bug does happen in complex applications."
>> This
>> seems to be essentially what's happening here: the server isn't
>> completing the connection (for some reason), stranding the socket in
>> `CLOSE_WAIT`.
>> 
>> We can't assume that the client is abandoning the connection after
>> `FIN_WAIT2` (the titular RFC violation); if the server stays in
>> `CLOSE_WAIT`, then even if the client dutifully stays in `FIN_WAIT2`
>> forever, the resolving condition still needs to be that the server
>> shuts down its side of the connection.
>> 
>> This diagram is a useful visual aid, mapping TCP states to the XNIO
>> API:
>> 
> https://www.lucidchart.com/publicSegments/view/524ec20a-5c40-4fd0-8bde-0a1c0a0046e1/image.png
>> 
>> --
>> - DML
> 
> --
> 
> Nishant Kumar
> Bangalore, India
> Mob: +91 80088 42030
> Email: nishantkumar35 at gmail.com
> _______________________________________________
> undertow-dev mailing list
> undertow-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/undertow-dev



More information about the undertow-dev mailing list