New subject: Help, please: Observing low Undertow throughput underheavy loads

Friday, 16 January 2015

Hi Undertow Team,

We recently deployed a large platform for processing high-frequency http
signals from around the Internet.  We are using undertow as our embedded
http server and are experiencing some serious throughput issues.  Hoping
you can help us to remedy them.  Here are our findings so far.

-When we dump thread stacks using jstack for a loaded server, we observe
that the I/O threads (1/core) are all blockng at
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).
-At the same time we see large numbers of  TCP Timeouts, TCP Listen Drops,
and TCP Overflows, which would seem to imply that we are not processing
connections fast enough
-There are large numbers of sockets int TIME_WAIT status
-TaskWorker threads are underutilized and most are in WAITING state sitting
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

We've observed this situation even against a no-op end point which
basically dispatches a handler, so we've eliminated almost all of our code
from the equation.  We also removed HTTPS traffic to take SSL out of the
equation.  CPU utilization on the boxes is very low and memory is fine as
well.  Disk I/O is also not an issue... we don't write to disk when hitting
the no-op endpoint

We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7
amazon regions.  We've tried tuning keepalive, IO thread count (currently
set to 4) and core/max task worker count (40) to no avail.   We decided to
move our compute instances  behind haproxy, which has improved the tcp
failure rates but we are still seeing very low throughput (roughly 200-300
request/sec max)

We are using 1.1.0-Final version of undertow.  We tried 1.2.0-Beta 6 but
after deploying our servers froze after about 10 minutes so we had to roll
back.

Do you have any tips on other things we can look at ?

Thanks in advance,

Matt C.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Help, please: Observing low Undertow throughput under heavy loads