I also encountered the "too many TCP connections in CLOSE_WAIT" issue
with multiple web servers (also Undertow).
We see this happening with mobile clients a lot as they often loose
connection in the most weird moments (railway tunnel, person walks into
As far as I'm aware this is an issue with TCP in general, when the
remote client just goes away (network route severed/power lost, not just
Thus a workaround is best applied on the layer actually handling the TCP
connection state. This is OSI layer 4 and handled by the operating
system kernel network stack.
On Linux I successfully use the following sysctl.conf entries to keep
the CLOSE_WAIT connections in check and avoid netfilter issues:
# keep the number of TCP connections in CLOSE_WAIT low
# by killing CLOSE_WAIT sockets after some time (will look like RESET to
the server processes)
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
# tune TCP keepalive to be a little more practical (2h -> 5 minutes
timeout, kill connection after 2 failed probes)
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 300
# more contrack table entries (else we may get some "upstream connection
# only needed when using netfilter connection tracking (for example NAT)
# net.nf_conntrack_max = 6556000
Undertow seems to work OK and behaves as an application on layer 7
should. There are open issues when using TLSv1.3 (100% CPU XNIO I/O
threads), but that seems off-topic here.
note: The inherent issues of TCP will likely go away when HTTP3 is in
wide use, as it replaces layer 4 TCP with layer 4+ QUIC.
On 2020-03-03 10:24, Nishant Kumar wrote:
I tried disabling HTTP/2 but still too many CLOSE_WAIT connections.
tried putting Nginx in front of the Java server but seems the same
issue with it. It seems Nginx is also creating the same number of
connections with Java.
server = Undertow.builder()
.setServerOption(UndertowOptions.IDLE_TIMEOUT, 150000) //
* 20) // 20000 sessions
.setServerOption(org.xnio.Options.SSL_SERVER_SESSION_TIMEOUT, 1500) //
# netstat -nalp | grep -E ":80 |:443 " | awk
a " " $6}'| sort | uniq -c
85918 443 CLOSE_WAIT
10279 443 ESTABLISHED
67 443 LAST_ACK
152 443 SYN_RECV
505 443 TIME_WAIT
31151 80 CLOSE_WAIT
3747 80 ESTABLISHED
108 80 LAST_ACK
146 80 SYN_RECV
On Tue, Mar 3, 2020 at 5:17 AM Stuart Douglas <sdouglas(a)redhat.com>
> Hmm, maybe this is a bug in the HTTP/2 close code then, and somehow
> the connection is not being closed if the client hangs up abruptly.
> I had a quick look at the code though and I think it looks ok, but
> maybe some more investigation is needed.
> On Tue, 3 Mar 2020 at 03:41, Nishant Kumar
> <nishantkumar35(a)gmail.com> wrote:
> Yes, i have no control on client side. I am using HTTP2. I have
> tried increasing open file limit to 400k but that consumes all
> memory and system hangs. I will probably try to put a nginx in front
> of Undertow and test.
> setServerOption(UndertowOptions.ENABLE_HTTP2, true)
> On Mon, Mar 2, 2020, 7:48 PM David Lloyd <david.lloyd(a)redhat.com>
> On Mon, Mar 2, 2020 at 7:56 AM Stan Rosenberg
> <stan.rosenberg(a)acm.org> wrote:
>> Stuck in CLOSE_WAIT is a symptom of the client-side not properly
> shutting down .
> I would partially disagree. In the article you linked: "It all
> with a listening application that leaks sockets and forgets to call
> close(). This kind of bug does happen in complex applications."
> seems to be essentially what's happening here: the server isn't
> completing the connection (for some reason), stranding the socket in
> We can't assume that the client is abandoning the connection after
> `FIN_WAIT2` (the titular RFC violation); if the server stays in
> `CLOSE_WAIT`, then even if the client dutifully stays in `FIN_WAIT2`
> forever, the resolving condition still needs to be that the server
> shuts down its side of the connection.
> This diagram is a useful visual aid, mapping TCP states to the XNIO
> - DML
Mob: +91 80088 42030
undertow-dev mailing list