Hi Stuart, Thanks for your reply:

> We've observed this situation even against a no-op end point which basically
> dispatches a handler, so we've eliminated almost all of our code from the
> equation. We also removed HTTPS traffic to take SSL out of the equation. CPU
> utilization on the boxes is very low and memory is fine as well. Disk I/O is
> also not an issue... we don't write to disk when hitting the no-op endpoint
>

What JVM and OS version are you using? This sounds like it might be an NIO issue, or some kind of NIO/TCP tuning issue.

>> We're running 1.7.0_45-b18 on Amazon Linux (amzn-ami-hvm-2014.09.1.x86_64-ebs (ami-4b6f650e))

> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7
> amazon regions. We've tried tuning keepalive, IO thread count (currently set
> to 4) and core/max task worker count (40) to no avail. We decided to move
> our compute instances behind haproxy, which has improved the tcp failure
> rates but we are still seeing very low throughput (roughly 200-300
> request/sec max)

Is it this low even with the empty endpoint?

We took those measurements with our normal endpoints. We're in the process of setting up some new tests against a more highly instrumented build to get some fresh numbers. Will post when we have them.

> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after
> deploying our servers froze after about 10 minutes so we had to roll back.

Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it locked up?

>I did but sadly I didn't keep it :-(. As I recall though, it was similar to the others...IO threads sitting on epoll and task workers parked waiting for jobs.

I've upgraded one of our servers with Beta 6 tonight and am running it, but so far it is performing normally. It's sitting behind HA Proxy, which seems to be smoothing out the traffic so I may not be able to replicate the issue until I can get it redeployed from behind HAP. Will advise further when I've done that.

many thanks,

Matt

On Sun, Jan 18, 2015 at 4:57 AM, Stuart Douglas <sdouglas@redhat.com> wrote:

----- Original Message -----
> From: "Matt Clarkson" <mclarkson@eyeota.com>
> To: undertow-dev@lists.jboss.org
> Sent: Saturday, 17 January, 2015 3:42:34 PM
> Subject: [undertow-dev] Help, please: Observing low Undertow throughput under heavy loads
>
> Hi Undertow Team,
>
> We recently deployed a large platform for processing high-frequency http
> signals from around the Internet. We are using undertow as our embedded http
> server and are experiencing some serious throughput issues. Hoping you can
> help us to remedy them. Here are our findings so far.
>
> -When we dump thread stacks using jstack for a loaded server, we observe that
> the I/O threads (1/core) are all blockng at
> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).
> -At the same time we see large numbers of TCP Timeouts, TCP Listen Drops, and
> TCP Overflows, which would seem to imply that we are not processing
> connections fast enough
> -There are large numbers of sockets int TIME_WAIT status
> -TaskWorker threads are underutilized and most are in WAITING state sitting
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>
> We've observed this situation even against a no-op end point which basically
> dispatches a handler, so we've eliminated almost all of our code from the
> equation. We also removed HTTPS traffic to take SSL out of the equation. CPU
> utilization on the boxes is very low and memory is fine as well. Disk I/O is
> also not an issue... we don't write to disk when hitting the no-op endpoint
>

What JVM and OS version are you using? This sounds like it might be an NIO issue, or some kind of NIO/TCP tuning issue.

> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7
> amazon regions. We've tried tuning keepalive, IO thread count (currently set
> to 4) and core/max task worker count (40) to no avail. We decided to move
> our compute instances behind haproxy, which has improved the tcp failure
> rates but we are still seeing very low throughput (roughly 200-300
> request/sec max)

Is it this low even with the empty endpoint?

>
> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after
> deploying our servers froze after about 10 minutes so we had to roll back.

Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it locked up?

Thanks,

Stuart

>
> Do you have any tips on other things we can look at ?
>
> Thanks in advance,
>
> Matt C.
>
> _______________________________________________
> undertow-dev mailing list
> undertow-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/undertow-dev