1 io thread / core is on a low side.
I would start with at least 2 theads / core.
Keep in mind also HT with that cpus
Sent from my Phone
-----Original Message-----
From: "Matt Clarkson" <mclarkson(a)eyeota.com>
Sent: 17.1.2015 5:43
To: "undertow-dev(a)lists.jboss.org" <undertow-dev(a)lists.jboss.org>
Subject: [undertow-dev] Help, please: Observing low Undertow throughput underheavy loads
Hi Undertow Team,
We recently deployed a large platform for processing high-frequency http signals from
around the Internet. We are using undertow as our embedded http server and are
experiencing some serious throughput issues. Hoping you can help us to remedy them. Here
are our findings so far.
-When we dump thread stacks using jstack for a loaded server, we observe that the I/O
threads (1/core) are all blockng at sun.nio.ch.EPollArrayWrapper.epollWait(Native
Method).
-At the same time we see large numbers of TCP Timeouts, TCP Listen Drops, and TCP
Overflows, which would seem to imply that we are not processing connections fast enough
-There are large numbers of sockets int TIME_WAIT status
-TaskWorker threads are underutilized and most are in WAITING state sitting at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
We've observed this situation even against a no-op end point which basically
dispatches a handler, so we've eliminated almost all of our code from the equation.
We also removed HTTPS traffic to take SSL out of the equation. CPU utilization on the
boxes is very low and memory is fine as well. Disk I/O is also not an issue... we
don't write to disk when hitting the no-op endpoint
We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7 amazon
regions. We've tried tuning keepalive, IO thread count (currently set to 4) and
core/max task worker count (40) to no avail. We decided to move our compute instances
behind haproxy, which has improved the tcp failure rates but we are still seeing very low
throughput (roughly 200-300 request/sec max)
We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after deploying
our servers froze after about 10 minutes so we had to roll back.
Do you have any tips on other things we can look at ?
Thanks in advance,
Matt C.