<div dir="ltr">Hi Stuart, Thanks for your reply:<div><br></div><div><span style="font-size:13px">> We've observed this situation even against a no-op end point which basically</span><br style="font-size:13px"><span style="font-size:13px">> dispatches a handler, so we've eliminated almost all of our code from the</span><br style="font-size:13px"><span style="font-size:13px">> equation. We also removed HTTPS traffic to take SSL out of the equation. CPU</span><br style="font-size:13px"><span style="font-size:13px">> utilization on the boxes is very low and memory is fine as well. Disk I/O is</span><br style="font-size:13px"><span style="font-size:13px">> also not an issue... we don't write to disk when hitting the no-op endpoint</span><br style="font-size:13px"><span style="font-size:13px">></span><br style="font-size:13px"><br style="font-size:13px"><span style="font-size:13px">What JVM and OS version are you using? This sounds like it might be an NIO issue, or some kind of NIO/TCP tuning issue.</span><br></div><div><br></div><div>>> We're running 1.7.0_45-b18 on Amazon Linux (<a>amzn-ami-hvm-2014.09.1.x86_64-ebs (ami-4b6f650e))</a></div><div><a><br></a></div><div><a><span style="color:rgb(34,34,34);font-size:13px">> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7</span><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">> amazon regions. We've tried tuning keepalive, IO thread count (currently set</span><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">> to 4) and core/max task worker count (40) to no avail. We decided to move</span><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">> our compute instances behind haproxy, which has improved the tcp failure</span><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">> rates but we are still seeing very low throughput (roughly 200-300</span><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">> request/sec max)</span><br style="color:rgb(34,34,34);font-size:13px"><br style="color:rgb(34,34,34);font-size:13px"><span style="color:rgb(34,34,34);font-size:13px">Is it this low even with the empty endpoint?</span><br></a></div><div><br></div><div>We took those measurements with our normal endpoints. We're in the process of setting up some new tests against a more highly instrumented build to get some fresh numbers. Will post when we have them.</div><div><br></div><div><span style="font-size:13px">> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after</span><br style="font-size:13px"><span style="font-size:13px">> deploying our servers froze after about 10 minutes so we had to roll back.</span><br style="font-size:13px"><br style="font-size:13px"><span style="font-size:13px">Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it locked up?</span><br></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">>I did but sadly I didn't keep it :-(. As I recall though, it was similar to the others...IO threads sitting on epoll and task workers parked waiting for jobs.</span></div><div><span style="font-size:13px"> I've upgraded one of our servers with Beta 6 tonight and am running it, but so far it is performing normally. It's sitting behind HA Proxy, which seems to be smoothing out the traffic so I may not be able to replicate the issue until I can get it redeployed from behind HAP. Will advise further when I've done that.</span></div><div><br></div><div>many thanks,</div><div><br></div><div>Matt</div>
</div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jan 18, 2015 at 4:57 AM, Stuart Douglas <span dir="ltr"><<a href="mailto:sdouglas@redhat.com" target="_blank">sdouglas@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
----- Original Message -----<br>
> From: "Matt Clarkson" <<a href="mailto:mclarkson@eyeota.com">mclarkson@eyeota.com</a>><br>
> To: <a href="mailto:undertow-dev@lists.jboss.org">undertow-dev@lists.jboss.org</a><br>
> Sent: Saturday, 17 January, 2015 3:42:34 PM<br>
> Subject: [undertow-dev] Help, please: Observing low Undertow throughput under heavy loads<br>
><br>
> Hi Undertow Team,<br>
><br>
> We recently deployed a large platform for processing high-frequency http<br>
> signals from around the Internet. We are using undertow as our embedded http<br>
> server and are experiencing some serious throughput issues. Hoping you can<br>
> help us to remedy them. Here are our findings so far.<br>
><br>
> -When we dump thread stacks using jstack for a loaded server, we observe that<br>
> the I/O threads (1/core) are all blockng at<br>
> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method).<br>
> -At the same time we see large numbers of TCP Timeouts, TCP Listen Drops, and<br>
> TCP Overflows, which would seem to imply that we are not processing<br>
> connections fast enough<br>
> -There are large numbers of sockets int TIME_WAIT status<br>
> -TaskWorker threads are underutilized and most are in WAITING state sitting<br>
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)<br>
><br>
> We've observed this situation even against a no-op end point which basically<br>
> dispatches a handler, so we've eliminated almost all of our code from the<br>
> equation. We also removed HTTPS traffic to take SSL out of the equation. CPU<br>
> utilization on the boxes is very low and memory is fine as well. Disk I/O is<br>
> also not an issue... we don't write to disk when hitting the no-op endpoint<br>
><br>
<br>
What JVM and OS version are you using? This sounds like it might be an NIO issue, or some kind of NIO/TCP tuning issue.<br>
<br>
> We're currently runnning on c2-xlarge EC2 instances (8 gb ram/4 cores) in 7<br>
> amazon regions. We've tried tuning keepalive, IO thread count (currently set<br>
> to 4) and core/max task worker count (40) to no avail. We decided to move<br>
> our compute instances behind haproxy, which has improved the tcp failure<br>
> rates but we are still seeing very low throughput (roughly 200-300<br>
> request/sec max)<br>
<br>
Is it this low even with the empty endpoint?<br>
<br>
><br>
> We are using 1.1.0-Final version of undertow. We tried 1.2.0-Beta 6 but after<br>
> deploying our servers froze after about 10 minutes so we had to roll back.<br>
<br>
Did you happen to get a thread dump or any info from 1.2.0.Beta6 when it locked up?<br>
<br>
Thanks,<br>
<br>
Stuart<br>
<br>
><br>
> Do you have any tips on other things we can look at ?<br>
><br>
> Thanks in advance,<br>
><br>
> Matt C.<br>
><br>
> _______________________________________________<br>
> undertow-dev mailing list<br>
> <a href="mailto:undertow-dev@lists.jboss.org">undertow-dev@lists.jboss.org</a><br>
> <a href="https://lists.jboss.org/mailman/listinfo/undertow-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/undertow-dev</a><br>
</blockquote></div><br></div>