Netty HTTP Proxy Performance Issue

Wed Aug 10 23:38:03 EDT 2011

Hi Trustin,

Thank you so much for your quick response!  I'll try replace OMATPE with
another executor and see how the perf run goes.

Also, the reason I did a "new ThreadPoolExecutor" is because I wanted more
control over the time it took for idle threads to be reaped.

---
baq

On Wed, Aug 10, 2011 at 5:28 PM, 이희승 (Trustin Lee) <trustin at gmail.com>wrote:

> Hi Baq,
>
> Specify Executors.newCachedThreadPool() instead of new
> ThreadPoolExecutor(…).
>
> OrderedMemoryAwareThreadPoolExecutor is known to perform poor.  If your
> sever does something that takes long time, you'd better use other thread
> pool implementation until OMATPE's perf issue is resolved (or it's replaced
> with something better)
>
> HTH
>
> --
> Trustin Lee <http://gleamynode.net/>
>
> On Thursday, August 11, 2011 at 9:14 AM, Baq Haidri wrote:
>
> Hi,
>
> I'm prototyping an HTTP service in Netty which accepts requests, and then
> proxies that request to several web applications and pulls their responses
> together into a single HTTP response containing an HTML payload.
>
> The server channel factory uses the following thread pools:
>
> int numCores = 4*Runtime.getRuntime().availableProcessors();
>
> Executor bossExecutor = new ThreadPoolExecutor(0, Integer.MAX_VALUE,
>
>                                                      60L,
> TimeUnit.MINUTES,
>
>                                                      newSynchronousQueue<Runnable>());
>
> Executor workerExecutor = Executors.newFixedThreadPool(numCores);
>
> bootstrap.setFactory(new NioServerSocketChannelFactory(bossExecutor,
> workerExecutor, numCores));
>
>
> The server pipeline contains an ExecutionHandler which has an
> OrderedMemoryAwareThreadPoolExecutor (the thread factory only changes the
> thread name):
>
>
> final ExecutionHandler embedRequestor = new ExecutionHandler(newOrderedMemoryAwareThreadPoolExecutor(numCores,
>
>
>                           memory,  // this is 500 Kb
>
>
>                           memory,
>
>
>                           60L, TimeUnit.MINUTES,
>
>
>                           new FSThreadFactory("embed-requestor")));
>
>
> The proxy HTTP clients all share a single client channel factory, which
> uses a cached thread pool whose threads have a long timeout (so that they're
> re-used):
>
>   private final static NioClientSocketChannelFactory clientChannelFactory
> =
>
>        new NioClientSocketChannelFactory(new ThreadPoolExecutor(0,
> Integer.MAX_VALUE,
>
>                                                                 60L,
> TimeUnit.MINUTES,
>
>                                                                 newSynchronousQueue<Runnable>()),
>
>                                          new ThreadPoolExecutor(0,
> Integer.MAX_VALUE,
>
>                                                                 60L,
> TimeUnit.MINUTES,
>
>                                                                 newSynchronousQueue<Runnable>()));
>
>
> When I run this through apache bench, with concurrency levels 1 to 100 in
> increments of 5, here's the performance (at 50th, 90th, 99th percentiles as
> well as request per second).  This is with the following GC settings:
>
> java -server -Xms1024m -Xmx1024m -XX:MaxNewSize=768m -XX:NewSize=768m
> -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC (note that Old Gen never gets
> full enough to actually force CMS)
>
>
>     Concurrency 50% 90% 99% RPS  1 70.928 84.199 93.12 13.7  5 74.101
> 99.795 141.391 62.91  10 88.744 146.416 232.333 99.22  15 108.265 188.855
> 305.268 121.44  20 130.508 231.05 438.312 134.31  25 158.417 297.67
> 666.237 132.42  30 170.459 349.15 2541.908 110.19  35 197.119 411.041
> 4129.77 110.64  40 237.777 493.632 3091.222 110.69  45 268.308 475.407
> 8118.155 98.13  50 306.447 553.581 6455.245 118.01  55 340.214 662.951
> 3274.517 110.55  60 363.786 779.651 5793.395 96.38  65 404.513 659.296
> 3803.272 119.49  70 439.377 800.518 7408.627 105.6  75 462.924 757.701
> 2264.736 137.37  80 507.593 1117.655 11667.996 82.76  85 521.067 968.124
> 6895.998 111.34  90 545.382 1713.862 9650.341 95.52  95 613.355 1016.572
> 2502.987 119.9  100 636.438 979.166 5237.865 129.07
> My question is: why is Netty's performance degrading so badly at higher
> concurrencies?  Why is the latency so unstable?  Why does the CPU get pegged
> at higher concurrencies?  I noticed when I profiled this code through
> JVisualVM that 1/3 of the CPU time is spent in
> LinkedTransferQueue.awaitMatch.
>
>
> Any suggestions people might have would be greatly appreciated.  We've
> implemented the same system in Node and as C++ module using ATS and are
> seeing far better performance numbers for both.
>
> Thanks,
>
> Baq
>
> _______________________________________________
> netty-users mailing list
> netty-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-users
>
>
>
> _______________________________________________
> netty-users mailing list
> netty-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/netty-users/attachments/20110810/c730b779/attachment-0001.html