Question about per-socket concurrency

Sun Aug 16 22:10:52 EDT 2009

Hi Yang,

On 08/14/2009 08:13 AM, Yang Zhang wrote:
> Yang Zhang wrote:
>> Yang Zhang wrote:
>>> Hi, we have a Netty-based server that is running into a bottleneck 
>>> that we suspect may be due to the way (socket-wise) concurrency 
>>> control works in the Netty reactor.  While we're exploring the code, 
>>> we're wondering if anyone has any insight into this so as to expedite 
>>> our performance debugging.
>>>
>>> The system is a simple topic-based publish-subscribe messaging system, 
>>> and the workload we're issuing is a handful of publishers publishing 
>>> messages to the server, all on separate hosts on a GigE LAN.  Each of 
>>> the publishers tops out at just 1000 messages per second, where each 
>>> message is 1KB in size.  However, we can keep piling on clients, and 
>>> the throughput scales up linearly.
>>>
>>>  From this info, one simple explanation would be that the culprit is a 
>>> bottleneck in the client.  Yet the strange thing is that the CPU 
>>> utilization of each client is just ~5%.  On the server, CPU 
>>> utilization hovers at ~20% when presented with a single publisher, and 
>>> grows another ~20% for each additional publisher.  The bottom line is 
>>> that we're being held back well before full CPU or network saturation.
>>>
>>> Is there any synchronization in the reactor core of Netty that could 
>>> be causing this per-socket bottleneck?  Thanks in advance for any hints.
>>
>> Along these lines, is there any documentation on what the 
>> threading/concurrency architecture of Netty looks like?  It has a pool 
>> of NioWorkers that it splays onto an executor thread pool, but beyond 
>> that things are murky.  Here's what we've learned so far from the source:
>>
>> After calling ServerBootstrap.bind(), Netty starts a boss thread that 
>> just accepts new connections and registers them with one of the workers 
>> from the worker pool in round-robin fashion (pool size defaults to CPU 
>> count).  Registration just pushes a new register task for a selector 
>> loop to handle.  All workers, and the boss, are executing via the 
>> executor thread pool; hence, the executor must support at least two 
>> simultaneous threads.
>>
>> The workers take turns running the select loop, which also handles other 
>> tasks, like register tasks (for these, the selector is properly woken 
>> up).  As far as I can tell, a worker continues running a loop so long as 
>> there are interested fd's/keys (i.e. forever).
>>
>> Furthermore, events seem to be handled in the same thread, via 
>> processSelectedKeys() -> read()/write().  This would all suggest that 
>> everything is running in the same thread - which of course can't be the 
>> case.  Thanks in advance for any clarification.
> 
> Turned out to be a bug in our code that explicitly causes rate limiting.
> 
> For posterity, updated notes on Netty's concurrency architecture:
> 
> After calling ServerBootstrap.bind(), Netty starts a boss thread that 
> just accepts new connections and registers them with one of the workers 
> from the worker pool in round-robin fashion (pool size defaults to CPU 
> count).  Each worker runs its own select loop over just the set of keys 
> that have been registered with it.  Workers start lazily on demand and 
> run only so long as there are interested fd's/keys.  All selected events 
> are handled in the same thread and sent up the pipeline attached to the 
> channel (this association is established by the boss as soon as a new 
> connection is accepted).
> 
> All workers, and the boss, run via the executor thread pool; hence, the 
> executor must support at least two simultaneous threads.
> 
> A pipeline implements the intercepting filter pattern.  A pipeline is a 
> sequence of handlers.  Whenever a packet is read from the wire, it 
> travels up the stream, stopping at each handler that can handle upstream 
> events.  Vice-versa for writes.  Between each filter, control flows back 
> through the centralized pipeline, and a linked list of contexts keeps 
> track of where we are in the pipeline (one context object per handler).

Good to hear that your problem was resolved.

I like your notes.  Would you mind if I include your notes in the
official documentation (Javadoc and user guide)?

Thanks,
Trustin