Question about per-socket concurrency

Thu Aug 13 19:13:01 EDT 2009

Yang Zhang wrote:
> Yang Zhang wrote:
>> Hi, we have a Netty-based server that is running into a bottleneck 
>> that we suspect may be due to the way (socket-wise) concurrency 
>> control works in the Netty reactor.  While we're exploring the code, 
>> we're wondering if anyone has any insight into this so as to expedite 
>> our performance debugging.
>>
>> The system is a simple topic-based publish-subscribe messaging system, 
>> and the workload we're issuing is a handful of publishers publishing 
>> messages to the server, all on separate hosts on a GigE LAN.  Each of 
>> the publishers tops out at just 1000 messages per second, where each 
>> message is 1KB in size.  However, we can keep piling on clients, and 
>> the throughput scales up linearly.
>>
>>  From this info, one simple explanation would be that the culprit is a 
>> bottleneck in the client.  Yet the strange thing is that the CPU 
>> utilization of each client is just ~5%.  On the server, CPU 
>> utilization hovers at ~20% when presented with a single publisher, and 
>> grows another ~20% for each additional publisher.  The bottom line is 
>> that we're being held back well before full CPU or network saturation.
>>
>> Is there any synchronization in the reactor core of Netty that could 
>> be causing this per-socket bottleneck?  Thanks in advance for any hints.
> 
> Along these lines, is there any documentation on what the 
> threading/concurrency architecture of Netty looks like?  It has a pool 
> of NioWorkers that it splays onto an executor thread pool, but beyond 
> that things are murky.  Here's what we've learned so far from the source:
> 
> After calling ServerBootstrap.bind(), Netty starts a boss thread that 
> just accepts new connections and registers them with one of the workers 
> from the worker pool in round-robin fashion (pool size defaults to CPU 
> count).  Registration just pushes a new register task for a selector 
> loop to handle.  All workers, and the boss, are executing via the 
> executor thread pool; hence, the executor must support at least two 
> simultaneous threads.
> 
> The workers take turns running the select loop, which also handles other 
> tasks, like register tasks (for these, the selector is properly woken 
> up).  As far as I can tell, a worker continues running a loop so long as 
> there are interested fd's/keys (i.e. forever).
> 
> Furthermore, events seem to be handled in the same thread, via 
> processSelectedKeys() -> read()/write().  This would all suggest that 
> everything is running in the same thread - which of course can't be the 
> case.  Thanks in advance for any clarification.

Turned out to be a bug in our code that explicitly causes rate limiting.

For posterity, updated notes on Netty's concurrency architecture:

After calling ServerBootstrap.bind(), Netty starts a boss thread that 
just accepts new connections and registers them with one of the workers 
from the worker pool in round-robin fashion (pool size defaults to CPU 
count).  Each worker runs its own select loop over just the set of keys 
that have been registered with it.  Workers start lazily on demand and 
run only so long as there are interested fd's/keys.  All selected events 
are handled in the same thread and sent up the pipeline attached to the 
channel (this association is established by the boss as soon as a new 
connection is accepted).

All workers, and the boss, run via the executor thread pool; hence, the 
executor must support at least two simultaneous threads.

A pipeline implements the intercepting filter pattern.  A pipeline is a 
sequence of handlers.  Whenever a packet is read from the wire, it 
travels up the stream, stopping at each handler that can handle upstream 
events.  Vice-versa for writes.  Between each filter, control flows back 
through the centralized pipeline, and a linked list of contexts keeps 
track of where we are in the pipeline (one context object per handler).
-- 
Yang Zhang
http://www.mit.edu/~y_z/