Non-deterministic packet stream corruption
Yang Zhang
yanghatespam at gmail.com
Thu Jun 18 01:48:26 EDT 2009
Hi, we have an application that uses Netty with the following pipeline:
ChannelPipeline pipeline = Channels.pipeline();
pipeline.addLast("lengthbaseddecoder",
new LengthFieldBasedFrameDecoder(cfg
.getMaximumMessageSize(), 0, 4, 0, 4));
pipeline.addLast("lengthprepender", new LengthFieldPrepender(4));
pipeline.addLast("protobufdecoder", new ProtobufDecoder(
PubSubProtocol.PubSubRequest.getDefaultInstance()));
pipeline.addLast("protobufencoder", new ProtobufEncoder());
pipeline.addLast("executor", new ExecutionHandler(
new OrderedMemoryAwareThreadPoolExecutor(MAX_WORKER_THREADS,
MAX_CHANNEL_MEMORY_SIZE, MAX_TOTAL_MEMORY_SIZE)));
// Dependency injection.
pipeline.addLast("umbrellahandler", uh);
In our application, a client connects to a server and starts sending a
stream of messages to which the server replies with app-level acks.
However, if we pump messages quickly enough, then often (but not all the
time), we see a situation where the server is receiving a bad
(truncated) packet. The stream consists of a series of app-level
frames: each frame should have a few bytes of header data (including a
4-byte length field inserted by the LengthFieldPrepender) and a payload
of 1024 'a' characters, but one frame somehow ends up with fewer 'a'
characters (even though the length field is 1024), and so the next
decoded length ends up being smack in the middle of the payload of the
next packet, which translates into an intolerably large length ("aaaa" =
0x61616161).
This error happens whether or not we specify 1 as the third parameter to
the constructor of NioClientSocketChannelFactory (the number of worker
threads). It smells like some sort of race condition - imagine that one
frame being written to some low-level buffer is overwritten starting in
the middle by another frame. The fact that the length is a correct
length (1024) suggests that the LengthPrepender and ProtobufEncoder are
all working properly, and that there's something deeper down the stack
(in Netty?) that is misbehaving.
We determined that the problem is probably originating on the client
since tcpdump/tcpflow is showing that the actual data stream is indeed
corrupted. (Also, a separate C++ implementation of the client doesn't
trigger this behavior.)
We're new to using Netty and we were wondering if we were possibly doing
something wrong along the way. We connect with:
private static ChannelFactory f = new
NioClientSocketChannelFactory(Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
...
public void connect() {
ClientBootstrap bootstrap = new ClientBootstrap(f);
bootstrap.setPipelineFactory(ClientChannelPipelineFactory.instance());
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
ChannelFuture fut = bootstrap.connect();
fut.addListener(...);
}
Let me know if there's any other information which may be useful.
Thanks in advance for any guesses.
--
Yang Zhang
http://www.mit.edu/~y_z/
More information about the netty-users
mailing list