Non-deterministic packet stream corruption

Yang Zhang yanghatespam at gmail.com
Thu Jun 18 01:48:26 EDT 2009


Hi, we have an application that uses Netty with the following pipeline:

ChannelPipeline pipeline = Channels.pipeline();

pipeline.addLast("lengthbaseddecoder",
         new LengthFieldBasedFrameDecoder(cfg
                 .getMaximumMessageSize(), 0, 4, 0, 4));
pipeline.addLast("lengthprepender", new LengthFieldPrepender(4));

pipeline.addLast("protobufdecoder", new ProtobufDecoder(
         PubSubProtocol.PubSubRequest.getDefaultInstance()));
pipeline.addLast("protobufencoder", new ProtobufEncoder());

pipeline.addLast("executor", new ExecutionHandler(
         new OrderedMemoryAwareThreadPoolExecutor(MAX_WORKER_THREADS,
                 MAX_CHANNEL_MEMORY_SIZE, MAX_TOTAL_MEMORY_SIZE)));

// Dependency injection.
pipeline.addLast("umbrellahandler", uh);

In our application, a client connects to a server and starts sending a 
stream of messages to which the server replies with app-level acks. 
However, if we pump messages quickly enough, then often (but not all the 
time), we see a situation where the server is receiving a bad 
(truncated) packet.  The stream consists of a series of app-level 
frames: each frame should have a few bytes of header data (including a 
4-byte length field inserted by the LengthFieldPrepender) and a payload 
of 1024 'a' characters, but one frame somehow ends up with fewer 'a' 
characters (even though the length field is 1024), and so the next 
decoded length ends up being smack in the middle of the payload of the 
next packet, which translates into an intolerably large length ("aaaa" = 
0x61616161).

This error happens whether or not we specify 1 as the third parameter to 
the constructor of NioClientSocketChannelFactory (the number of worker 
threads).  It smells like some sort of race condition - imagine that one 
frame being written to some low-level buffer is overwritten starting in 
the middle by another frame.  The fact that the length is a correct 
length (1024) suggests that the LengthPrepender and ProtobufEncoder are 
all working properly, and that there's something deeper down the stack 
(in Netty?) that is misbehaving.

We determined that the problem is probably originating on the client 
since tcpdump/tcpflow is showing that the actual data stream is indeed 
corrupted.  (Also, a separate C++ implementation of the client doesn't 
trigger this behavior.)

We're new to using Netty and we were wondering if we were possibly doing 
something wrong along the way.  We connect with:

private static ChannelFactory f = new 
NioClientSocketChannelFactory(Executors.newCachedThreadPool(), 
Executors.newCachedThreadPool());

...
public void connect() {
ClientBootstrap bootstrap = new ClientBootstrap(f);
bootstrap.setPipelineFactory(ClientChannelPipelineFactory.instance());
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
ChannelFuture fut = bootstrap.connect();
fut.addListener(...);
}

Let me know if there's any other information which may be useful. 
Thanks in advance for any guesses.
-- 
Yang Zhang
http://www.mit.edu/~y_z/



More information about the netty-users mailing list