Exception on connect being caught by response handler

jgalyan jgalyan at adbrite.com
Mon Nov 29 13:38:00 EST 2010


First, apologies for my delay; I just got back from traveling over the
Thanksgiving holiday here.

I think maybe I didn't describe my scenario as thoroughly as I should have.

I have a memcached cluster that I'm connecting to via a netty-based client. 
We here have our own load-balancing algorithm for choosing a host from the
cluster, as well as a mechanism for "marking down" a non-responsive host. 
We've customized our memcached to load its data from a file created by
offline processing in order to make our servers not be subject to database
latency, so one host at a time in the cluster will go offline periodically
to re-load the data file (which gets updated periodically).  This has been
in production for quite some time, and all works fine using the old, slower
client, or with the new netty-based client when the selected host is up.  

My netty-based client is integrated into a netty-based server, which is
responding to requests from another server in our overall system.  So, the
server has all the usual plumbing (bootstrap, pipeline, etc.); a subclass of
SimpleChannelUpstreamHandler, as part of servicing the incoming request,
sends two outbound requests: one to a third-party service over http (which
is working fine, even if we get connection refused), and now one to our
local memcached via raw TCP (for different data than we get from the
third-party, and where I'm having trouble).  When the client bootstrap's
connect() method is called, I attach a new ChannelFutureListener to the
ChannelFuture returned, so that I'm not calling await() in an I/O thread;
this ChannelFutureListener's operationComplete implementation checks for
success on the ChannelFuture, and if true, writes the request to the
memcached host.  If the ChannelFuture's isSuccess() returns false, then I
"mark down" that host for a period of time (the assumption is that since we
expect one host at a time in the cluster to go offline periodically, we
should only mark it down so we don't send requests to it for a short time
rather than treating a single refused connection as an error; dealing with
an entire cluster being down is handled separately).  I expect
ConnectException to be the Throwable returned by getCause() if the
connection was refused due to the host being offline, so I had expected to
handle it here.

My netty-based memcached client has all the usual plumbing you'd expect
there, as well: client bootstrap, pipeline, etc.  In the case where a
connection is refused because there is nothing listening on the port we bind
to on the target host, I expect the client pipeline not to be invoked, but
it appears that it is.  My client pipeline has a custom OneToOneEncoder, a
custom ReplayingDecoder (we've implemented our own chunking logic at the
application level to get around memcached's hard data size limit, so I have
to handle assembling the chunks myself), and a custom
SimpleChannelUpsteamHandler that deals with the data returned from
memcached.  This handler also overrides exceptionCaught.  

When I step through in the debugger, I see that exceptionCaught is being
called in the upstream handler, and it appears that it's being called after
operationComplete in the ChannelFutureListener, and then messageReceived is
being called.  (I set breakpoints in all three and sent a single request,
which generates a single outbound request.)  I don't know if the sequence is
actually as it appears, since these are happening in separate threads, but
I'm confused about why exceptionCaught and messageReceived are getting
called in the upstream handler when the connection is refused.  As I said, I
had expected to find out about the connection being refused in the
ChannelFutureListener's operationComplete callback.

I've compared my code in the (perfectly functioning) http pipeline and the
raw TCP memcached pipeline, and other than the protocol parts and the actual
logic of handling the content of responses, they're identical.  The http
side works perfectly, even if the target host refuses our connection, and
the raw TCP side doesn't work as expected.  Since the code for each is
basically identical, and everything is working as expected except for where
the ConnectException is handled, I'm not sure where to look to find my bug.
-- 
View this message in context: http://netty-forums-and-mailing-lists.685743.n2.nabble.com/Exception-on-connect-being-caught-by-response-handler-tp5764119p5785450.html
Sent from the Netty User Group mailing list archive at Nabble.com.


More information about the netty-users mailing list