[undertow-dev] Help, please: Observing low Undertow throughput under heavy loads

Matt Clarkson mclarkson at eyeota.com
Sun Jan 18 08:41:21 EST 2015


Hi Stuart, Thanks for your reply:

> We've observed this situation even against a no-op end point which
basically
> dispatches a handler, so we've eliminated almost all of our code from the
> equation. We also removed HTTPS traffic to take SSL out of the equation.
CPU
> utilization on the boxes is very low and memory is fine as well. Disk I/O
is
> also not an issue... we don't write to disk when hitting the no-op
endpoint
>

What JVM and OS version are you using? This sounds like it might be an NIO
issue, or some kind of NIO/TCP tuning issue.

>> We're running  1.7.0_45-b18 on Amazon Linux (amzn-ami-hvm-2014.09.1.x86_64-ebs
(ami-4b6f650e))

> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in
7
> amazon regions. We've tried tuning keepalive, IO thread count (currently
set
> to 4) and core/max task worker count (40) to no avail. We decided to move
> our compute instances behind haproxy, which has improved the tcp failure
> rates but we are still seeing very low throughput (roughly 200-300
> request/sec max)

Is it this low even with the empty endpoint?

We took those measurements with our normal endpoints. We're in the process
of setting up some new tests against a more highly instrumented build to
get some fresh numbers.  Will post when we have them.

> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but
after
> deploying our servers froze after about 10 minutes so we had to roll back.

Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it
locked up?

>I did but sadly I didn't keep it :-(.  As I recall though, it was similar
to the others...IO threads sitting on epoll and task workers parked waiting
for jobs.
  I've upgraded one of our servers with Beta 6 tonight and am running it,
but so far it is performing normally.   It's sitting behind HA Proxy, which
seems to be smoothing out the traffic so I may not be able to replicate the
issue until I can get it redeployed from behind HAP.  Will advise further
when I've done that.

many thanks,

Matt

On Sun, Jan 18, 2015 at 4:57 AM, Stuart Douglas <sdouglas at redhat.com> wrote:

>
>
> ----- Original Message -----
> > From: "Matt Clarkson" <mclarkson at eyeota.com>
> > To: undertow-dev at lists.jboss.org
> > Sent: Saturday, 17 January, 2015 3:42:34 PM
> > Subject: [undertow-dev] Help, please: Observing low Undertow throughput
> under heavy loads
> >
> > Hi Undertow Team,
> >
> > We recently deployed a large platform for processing high-frequency http
> > signals from around the Internet. We are using undertow as our embedded
> http
> > server and are experiencing some serious throughput issues. Hoping you
> can
> > help us to remedy them. Here are our findings so far.
> >
> > -When we dump thread stacks using jstack for a loaded server, we observe
> that
> > the I/O threads (1/core) are all blockng at
> > sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).
> > -At the same time we see large numbers of TCP Timeouts, TCP Listen
> Drops, and
> > TCP Overflows, which would seem to imply that we are not processing
> > connections fast enough
> > -There are large numbers of sockets int TIME_WAIT status
> > -TaskWorker threads are underutilized and most are in WAITING state
> sitting
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >
> > We've observed this situation even against a no-op end point which
> basically
> > dispatches a handler, so we've eliminated almost all of our code from the
> > equation. We also removed HTTPS traffic to take SSL out of the equation.
> CPU
> > utilization on the boxes is very low and memory is fine as well. Disk
> I/O is
> > also not an issue... we don't write to disk when hitting the no-op
> endpoint
> >
>
> What JVM and OS version are you using? This sounds like it might be an NIO
> issue, or some kind of NIO/TCP tuning issue.
>
> > We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores)
> in 7
> > amazon regions. We've tried tuning keepalive, IO thread count (currently
> set
> > to 4) and core/max task worker count (40) to no avail. We decided to move
> > our compute instances behind haproxy, which has improved the tcp failure
> > rates but we are still seeing very low throughput (roughly 200-300
> > request/sec max)
>
> Is it this low even with the empty endpoint?
>
> >
> > We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but
> after
> > deploying our servers froze after about 10 minutes so we had to roll
> back.
>
> Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it
> locked up?
>
> Thanks,
>
> Stuart
>
> >
> > Do you have any tips on other things we can look at ?
> >
> > Thanks in advance,
> >
> > Matt C.
> >
> > _______________________________________________
> > undertow-dev mailing list
> > undertow-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/undertow-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/undertow-dev/attachments/20150118/4c34cdc3/attachment-0001.html 


More information about the undertow-dev mailing list