[undertow-dev] Help, please: Observing low Undertow throughput under heavy loads

Sat Jan 17 15:57:37 EST 2015

----- Original Message -----
> From: "Matt Clarkson" <mclarkson at eyeota.com>
> To: undertow-dev at lists.jboss.org
> Sent: Saturday, 17 January, 2015 3:42:34 PM
> Subject: [undertow-dev] Help,	please: Observing low Undertow throughput under heavy loads
> 
> Hi Undertow Team,
> 
> We recently deployed a large platform for processing high-frequency http
> signals from around the Internet. We are using undertow as our embedded http
> server and are experiencing some serious throughput issues. Hoping you can
> help us to remedy them. Here are our findings so far.
> 
> -When we dump thread stacks using jstack for a loaded server, we observe that
> the I/O threads (1/core) are all blockng at
> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).
> -At the same time we see large numbers of TCP Timeouts, TCP Listen Drops, and
> TCP Overflows, which would seem to imply that we are not processing
> connections fast enough
> -There are large numbers of sockets int TIME_WAIT status
> -TaskWorker threads are underutilized and most are in WAITING state sitting
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 
> We've observed this situation even against a no-op end point which basically
> dispatches a handler, so we've eliminated almost all of our code from the
> equation. We also removed HTTPS traffic to take SSL out of the equation. CPU
> utilization on the boxes is very low and memory is fine as well. Disk I/O is
> also not an issue... we don't write to disk when hitting the no-op endpoint
> 

What JVM and OS version are you using? This sounds like it might be an NIO issue, or some kind of NIO/TCP tuning issue. 

> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7
> amazon regions. We've tried tuning keepalive, IO thread count (currently set
> to 4) and core/max task worker count (40) to no avail. We decided to move
> our compute instances behind haproxy, which has improved the tcp failure
> rates but we are still seeing very low throughput (roughly 200-300
> request/sec max)

Is it this low even with the empty endpoint?

> 
> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after
> deploying our servers froze after about 10 minutes so we had to roll back.

Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it locked up?

Thanks,

Stuart

> 
> Do you have any tips on other things we can look at ?
> 
> Thanks in advance,
> 
> Matt C.
> 
> _______________________________________________
> undertow-dev mailing list
> undertow-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/undertow-dev