Unfair writer/reader ration (revisited)

Fri Mar 19 00:22:44 EDT 2010

Hi 

	I've built a large scale application using netty and the unfair writer/reader ratio is getting problematic.
	I actually need 3 client machines to flood one server machine, which means that to test scaling to 100 nodes I actually need 300 more (which is very expensive in EC2 :). 
	If I can get this ratio even to 1.5/1 (from 3/1)  that would already be an enormous progress and would allow me to continue my work (my ultimate goal is to run a 1000 node cluster).
	
	My setup: Each client runs several threads each thread has its own set of connections to the servers (to avoid unecessary contention on some bottlenecks)
	Each thread I has its own set of netty pipelines (one for each different server and for each "stream" within that server up to about 32 different "streams" split across 3-4 different servers). I did this for ease of abstraction (clients simply request connections to streams, disregarding where they are).
	For this particular test (more of an I/O test) clients do mostly nothing except serialization and socket writes (objects are kept in a pool so no object creation overheads and serialization is very simple where each object knows how to write and read itself from a DataOutput/DataInput)
	
	Servers handle great (even when flooded i.e. no more net I/O in) they maintain stable load (about 65%)
	Clients require more cpu (about 75% each) and I actually need three whole client machines to flood one server machine.

	I've tested several configurations tuning the buffersize on the client and server side as well as other parameters. Found the optimal configuration but my problem didn't go away.
	
	I must be doing something wrong. Any pointers?

	Some specific doubts:
	- Should I make sure that "streams" of the same client to the same server use the same pipeline?
	- Can I make serialization any faster (I'm using Channels.dynamicBuffer())?
	- the serialization is performed on a OneToOneEncoder this is ran by my own threads right? not by Netty's client I/O workers? 

	On a more positive note, when configured for latency (small batch flushes and tcpnodelay on) Netty handles great (total cluster throughput of about 2.5 GB/sec and latencies of <0.5 sec :)

Any help would be greatly appreciated
Best Regards
David Alves