Netty HTTP Proxy Performance Issue
Baq Haidri
baqhaidri at gmail.com
Wed Aug 10 20:14:58 EDT 2011
Hi,
I'm prototyping an HTTP service in Netty which accepts requests, and then
proxies that request to several web applications and pulls their responses
together into a single HTTP response containing an HTML payload.
The server channel factory uses the following thread pools:
int numCores = 4*Runtime.getRuntime().availableProcessors();
Executor bossExecutor = new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.MINUTES,
newSynchronousQueue<Runnable>());
Executor workerExecutor = Executors.newFixedThreadPool(numCores);
bootstrap.setFactory(new NioServerSocketChannelFactory(bossExecutor,
workerExecutor, numCores));
The server pipeline contains an ExecutionHandler which has an
OrderedMemoryAwareThreadPoolExecutor (the thread factory only changes the
thread name):
final ExecutionHandler embedRequestor = new
ExecutionHandler(newOrderedMemoryAwareThreadPoolExecutor(numCores,
memory, // this is 500 Kb
memory,
60L, TimeUnit.MINUTES,
new FSThreadFactory("embed-requestor")));
The proxy HTTP clients all share a single client channel factory, which uses
a cached thread pool whose threads have a long timeout (so that they're
re-used):
private final static NioClientSocketChannelFactory clientChannelFactory
=
new NioClientSocketChannelFactory(new ThreadPoolExecutor(0,
Integer.MAX_VALUE,
60L,
TimeUnit.MINUTES,
newSynchronousQueue<Runnable>()),
new ThreadPoolExecutor(0,
Integer.MAX_VALUE,
60L,
TimeUnit.MINUTES,
newSynchronousQueue<Runnable>()));
When I run this through apache bench, with concurrency levels 1 to 100 in
increments of 5, here's the performance (at 50th, 90th, 99th percentiles as
well as request per second). This is with the following GC settings:
java -server -Xms1024m -Xmx1024m -XX:MaxNewSize=768m -XX:NewSize=768m
-XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC (note that Old Gen never gets
full enough to actually force CMS)
Concurrency 50% 90% 99% RPS 1 70.928 84.199 93.12 13.7 5 74.101
99.795 141.391 62.91 10 88.744 146.416 232.333 99.22 15 108.265 188.855
305.268 121.44 20 130.508 231.05 438.312 134.31 25 158.417 297.67 666.237
132.42 30 170.459 349.15 2541.908 110.19 35 197.119 411.041 4129.77 110.64
40 237.777 493.632 3091.222 110.69 45 268.308 475.407 8118.155 98.13 50
306.447 553.581 6455.245 118.01 55 340.214 662.951 3274.517 110.55 60
363.786 779.651 5793.395 96.38 65 404.513 659.296 3803.272 119.49 70
439.377 800.518 7408.627 105.6 75 462.924 757.701 2264.736 137.37 80
507.593 1117.655 11667.996 82.76 85 521.067 968.124 6895.998 111.34 90
545.382 1713.862 9650.341 95.52 95 613.355 1016.572 2502.987 119.9 100
636.438 979.166 5237.865 129.07
My question is: why is Netty's performance degrading so badly at higher
concurrencies? Why is the latency so unstable? Why does the CPU get pegged
at higher concurrencies? I noticed when I profiled this code through
JVisualVM that 1/3 of the CPU time is spent in
LinkedTransferQueue.awaitMatch.
Any suggestions people might have would be greatly appreciated. We've
implemented the same system in Node and as C++ module using ATS and are
seeing far better performance numbers for both.
Thanks,
Baq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/netty-users/attachments/20110810/17d73b56/attachment.html
More information about the netty-users
mailing list