Re: [undertow-dev] Too many open files: Exception accepting request, closing server channel TCP server (NIO)

Tuesday, 3 March 2020

Hi,

I also encountered the "too many TCP connections in CLOSE_WAIT" issue 
with multiple web servers (also Undertow).

We see this happening with mobile clients a lot as they often loose 
connection in the most weird moments (railway tunnel, person walks into 
building, ...).

As far as I'm aware this is an issue with TCP in general, when the 
remote client just goes away (network route severed/power lost, not just 
process terminated).

Thus a workaround is best applied on the layer actually handling the TCP 
connection state. This is OSI layer 4 and handled by the operating 
system kernel network stack.

On Linux I successfully use the following sysctl.conf entries to keep 
the CLOSE_WAIT connections in check and avoid netfilter issues:

# keep the number of TCP connections in CLOSE_WAIT low
# by killing CLOSE_WAIT sockets after some time (will look like RESET to 
the server processes)
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
# tune TCP keepalive to be a little more practical (2h -> 5 minutes 
timeout, kill connection after 2 failed probes)
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 300

# more contrack table entries (else we may get some "upstream connection 
timed out"
# only needed when using netfilter connection tracking (for example NAT)
# net.nf_conntrack_max = 6556000

Undertow seems to work OK and behaves as an application on layer 7 
should. There are open issues when using TLSv1.3 (100% CPU XNIO I/O 
threads), but that seems off-topic here.

note: The inherent issues of TCP will likely go away when HTTP3 is in 
wide use, as it replaces layer 4 TCP with layer 4+ QUIC.

On 2020-03-03 10:24, Nishant Kumar wrote:
...
 I tried disabling HTTP/2 but still too many CLOSE_WAIT connections.
I
 tried putting Nginx in front of the Java server but seems the same
 issue with it. It seems Nginx is also creating the same number of
 connections with Java.

 server = Undertow.builder()
           .addHttpListener(SERVER_LISTEN_PORT, SERVER_HOST)
           .addHttpsListener(SERVER_SSL_LISTEN_PORT, SERVER_HOST,
 sslContext)
           .setWorkerThreads(WORKER_THREAD)
           .SETSERVEROPTION(UNDERTOWOPTIONS.ENABLE_HTTP2, FALSE)
           .setServerOption(UndertowOptions.IDLE_TIMEOUT, 150000) //
 150s
           .setServerOption(UndertowOptions.NO_REQUEST_TIMEOUT, 150000)
 // 150s

 .setServerOption(org.xnio.Options.SSL_SERVER_SESSION_CACHE_SIZE, 1024
 * 20) // 20000 sessions

 .setServerOption(org.xnio.Options.SSL_SERVER_SESSION_TIMEOUT, 1500) //
 150s
           .setIoThreads(IO_THREAD)
           .setWorkerOption(org.xnio.Options.TCP_NODELAY, true)
           .setSocketOption(org.xnio.Options.TCP_NODELAY, true)
           .setSocketOption(org.xnio.Options.KEEP_ALIVE, true)
           .setSocketOption(org.xnio.Options.REUSE_ADDRESSES, true)
           .setSocketOption(org.xnio.Options.CONNECTION_HIGH_WATER,
 100000)
           .setSocketOption(org.xnio.Options.CONNECTION_LOW_WATER,
 100000)
           .setHandler(Handlers.routing().post("/", new
 RequestHandler(appContext)))
           .build();

 #  netstat -nalp  | grep -E ":80 |:443 " | awk
'{split($4,a,":");print
 a[2] " " $6}'| sort | uniq -c
   85918 443 CLOSE_WAIT
   10279 443 ESTABLISHED
      67 443 LAST_ACK
     152 443 SYN_RECV
     505 443 TIME_WAIT
   31151 80 CLOSE_WAIT
    3747 80 ESTABLISHED
     108 80 LAST_ACK
     146 80 SYN_RECV
       2  LISTEN

 On Tue, Mar 3, 2020 at 5:17 AM Stuart Douglas <sdouglas(a)redhat.com&gt;
 wrote:

> Hmm, maybe this is a bug in the HTTP/2 close code then, and somehow
> the connection is not being closed if the client hangs up abruptly.
> I had a quick look at the code though and I think it looks ok, but
> maybe some more investigation is needed.
> 
> Stuart
> 
> On Tue, 3 Mar 2020 at 03:41, Nishant Kumar
> <nishantkumar35(a)gmail.com&gt; wrote:
> 
> Yes, i have no control on client side. I am using HTTP2. I have
> tried increasing open file limit to 400k but that consumes all
> memory and system hangs. I will probably try to put a nginx in front
> of Undertow and test.
> 
> setServerOption(UndertowOptions.ENABLE_HTTP2, true)
> 
> On Mon, Mar 2, 2020, 7:48 PM David Lloyd <david.lloyd(a)redhat.com&gt;
> wrote:
> On Mon, Mar 2, 2020 at 7:56 AM Stan Rosenberg
> <stan.rosenberg(a)acm.org&gt; wrote:
>> 
>> Stuck in CLOSE_WAIT is a symptom of the client-side not properly
> shutting down [1].
> 
> I would partially disagree.  In the article you linked: "It all
> starts
> with a listening application that leaks sockets and forgets to call
> close(). This kind of bug does happen in complex applications."
> This
> seems to be essentially what's happening here: the server isn't
> completing the connection (for some reason), stranding the socket in
> `CLOSE_WAIT`.
> 
> We can't assume that the client is abandoning the connection after
> `FIN_WAIT2` (the titular RFC violation); if the server stays in
> `CLOSE_WAIT`, then even if the client dutifully stays in `FIN_WAIT2`
> forever, the resolving condition still needs to be that the server
> shuts down its side of the connection.
> 
> This diagram is a useful visual aid, mapping TCP states to the XNIO
> API:
> 

https://www.lucidchart.com/publicSegments/view/524ec20a-5c40-4fd0-8bde-0a...
> 
> --
> - DML

 --

 Nishant Kumar
 Bangalore, India
 Mob: +91 80088 42030
 Email: nishantkumar35(a)gmail.com
 _______________________________________________
 undertow-dev mailing list
 undertow-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/undertow-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [undertow-dev] Too many open files: Exception accepting request, closing server channel TCP server (NIO)