Unfair writers/readers ration (revisited)

Fri Mar 19 00:46:09 EDT 2010

Hi 

        I've built a large scale application using netty and the unfair
writer/reader ratio is getting problematic. 
        I actually need 3 client machines to flood one server machine, which
means that to test scaling to 100 nodes I actually need 300 more (which is
very expensive in EC2 :). 
        If I can get this ratio even to 1.5/1 (from 3/1)  that would already
be an enormous progress and would allow me to continue my work (my ultimate
goal is to run a 1000 node cluster). 
        
        My setup: Each client runs several threads each thread has its own
set of connections to the servers (to avoid unecessary contention on some
bottlenecks) 
        Each thread I has its own set of netty pipelines (one for each
different server and for each "stream" within that server up to about 32
different "streams" split across 3-4 different servers). I did this for ease
of abstraction (clients simply request connections to streams, disregarding
where they are). 
        For this particular test (more of an I/O test) clients do mostly
nothing except serialization and socket writes (objects are kept in a pool
so no object creation overheads and serialization is very simple where each
object knows how to write and read itself from a DataOutput/DataInput) 
        
        Servers handle great (even when flooded i.e. no more net I/O in)
they maintain stable load (about 65%) 
        Clients require more cpu (about 75% each) and I actually need three
whole client machines to flood one server machine. 

        I've tested several configurations tuning the buffersize on the
client and server side as well as other parameters. Found the optimal
configuration but my problem didn't go away. 
        
        I must be doing something wrong. Any pointers? 

        Some specific doubts: 
        - Should I make sure that "streams" of the same client to the same
server use the same pipeline? 
        - Can I make serialization any faster (I'm using
Channels.dynamicBuffer())? 
        - the serialization is performed on a OneToOneEncoder this is ran by
my own threads right? not by Netty's client I/O workers? 

        On a more positive note, when configured for latency (small batch
flushes and tcpnodelay on) Netty handles great (total cluster throughput of
about 2.5 GB/sec and latencies of <0.5 sec :) 

Any help would be greatly appreciated 
Best Regards 
David Alves 
-- 
View this message in context: http://n2.nabble.com/Unfair-writers-readers-ration-revisited-tp4761149p4761149.html
Sent from the Netty User Group mailing list archive at Nabble.com.