[undertow-dev] Help, please: Observing low Undertow throughput under heavy loads

Matt Clarkson mclarkson at eyeota.com
Sun Jan 18 08:46:06 EST 2015


Hi Jason,

Thanks much for your great response... I've done some recoding to set the
BACKLOG parameter and will test it out tomorrow with some of your other
tuning recommendations.  In order to answer your other questions, I'll need
to get a node reconfigured out from behind the load balancer.  Will advise
when I've got some updates in the next 24 hours or so.

many thanks,

MTC

On Sun, Jan 18, 2015 at 12:01 AM, Jason Greene <jason.greene at redhat.com>
wrote:

>
> > On Jan 17, 2015, at 9:50 AM, jason.greene at redhat.com wrote:
> >
> > Hi Matt,
> >
> > Thank you for posting the problem you are running into.  We definitely
> want to help.
> >
> > A couple of questions, with just undertow in the picture (no haproxy):
> >
> > - Are you seeing a message like this in dmesg /var/log/messages:
> >
> > "possible SYN flooding on port 80. Sending cookies.”
> >
> > - Can you paste an output of netstat -S when things are going wrong?
> >
> > If you are seeing listen drops, then the first thing to do would be to
> raise the Options.BACKLOG setting to a high value (e.g. 16384), so that if
> the I/O threads aren’t accepting as fast as the connections come in they
> queue instead of drop. Can you give us an approximation of how many
> connections a node is typically handling? If you are in the 100k+
> connection count range, have you done any TCP tuning? (e.g. tuning or
> removing netfilter contracting, setting net.core.netdev_max_backlog, ), as
> that can also lead to TCP timeouts/drops/delays.
> >
> > In any case just start with setting Options.BACKLOG, and seeing if
> failures decrease.
> >
> > Haproxy might set a higher backlog by default, explaining the difference
> in failure rates. It could also act as a throttle, by purposefully limiting
> how much it proxies to undertow.
>
> Just to clarify, what I mean is that with haproxy in the picture, it
> probably needs to be tuned to pass on more load.
>
> >
> > If I understand correctly your no-op endpoint is using a dispatch, so
> utilizing the worker pool, which I imagine models your app? If so your
> worker pool will need to be sized to account for the wait time it spends
> not using CPU cycles, but waiting for something like a database, or the
> file system.  If your use case has lots of wait/blocking like this, then a
> very large worker pool would improve throughput (64+ threads for a 4 core).
> >
> > Thanks!
> >
> > On Jan 16, 2015, at 10:43 PM, Matt Clarkson <mclarkson at eyeota.com>
> wrote:
> >
> >> Hi Undertow Team,
> >>
> >> We recently deployed a large platform for processing high-frequency
> http signals from around the Internet.  We are using undertow as our
> embedded http server and are experiencing some serious throughput issues.
> Hoping you can help us to remedy them.  Here are our findings so far.
> >>
> >> -When we dump thread stacks using jstack for a loaded server, we
> observe that the I/O threads (1/core) are all blockng at
> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).
> >> -At the same time we see large numbers of  TCP Timeouts, TCP Listen
> Drops, and TCP Overflows, which would seem to imply that we are not
> processing connections fast enough
> >> -There are large numbers of sockets int TIME_WAIT status
> >> -TaskWorker threads are underutilized and most are in WAITING state
> sitting at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >>
> >> We've observed this situation even against a no-op end point which
> basically dispatches a handler, so we've eliminated almost all of our code
> from the equation.  We also removed HTTPS traffic to take SSL out of the
> equation.  CPU utilization on the boxes is very low and memory is fine as
> well.  Disk I/O is also not an issue... we don't write to disk when hitting
> the no-op endpoint
> >>
> >> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores)
> in 7 amazon regions.  We've tried tuning keepalive, IO thread count
> (currently set to 4) and core/max task worker count (40) to no avail.   We
> decided to move our compute instances  behind haproxy, which has improved
> the tcp failure rates but we are still seeing very low throughput (roughly
> 200-300 request/sec max)
> >>
> >> We are using 1.1.0-Final version of undertow.  We tried 1.2.0-Beta 6
> but after deploying our servers froze after about 10 minutes so we had to
> roll back.
> >>
> >> Do you have any tips on other things we can look at ?
> >>
> >> Thanks in advance,
> >>
> >> Matt C.
> >> _______________________________________________
> >> undertow-dev mailing list
> >> undertow-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/undertow-dev
> >
> >
> > --
> > Jason T. Greene
> > WildFly Lead / JBoss EAP Platform Architect
> > JBoss, a division of Red Hat
> >
> > _______________________________________________
> > undertow-dev mailing list
> > undertow-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/undertow-dev
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/undertow-dev/attachments/20150118/e005f4e4/attachment.html 


More information about the undertow-dev mailing list