Shutting down the netty server takes long time when using SSLHandler.

Wed Apr 15 11:31:10 EDT 2009

I found a workaround for the ClosedChannelException and checked the
fix in, too.  It seems to work pretty well now.  However, I'm not sure
if the new behavior will break existing non-Netty clients.  We will
see..

— Trustin Lee, http://gleamynode.net/

On Thu, Apr 16, 2009 at 12:04 AM, Trustin Lee <tlee at redhat.com> wrote:
> Hi Virat,
>
> I've just checked in the workaround for this issue.  SslHandler now
> closes the connection immediately after sending the closure_notify and
> swallows a connection reset by peer error if SSL session is closed.  I
> made this change because otherwise SslHandler will be vulnerable to a
> DoS attack.  An attacker would be able to prevent Netty from closing a
> connection forever just by sending a partial SSL message.
>
> I used preliminary regular expression matcher to swallow the
> exception, so it will not work for localized messages, but will work
> for most systems.
>
> Another exception that can be raised rarely is ClosedChannelException.
>  It can be raised when SslHandler attempts to send a closure_notify
> after the connection is closed.  It is impossible to swallow this
> exception unfortunately because ClosedChannelException doesn't convey
> any information that tells which write request caused such an
> exception.  I could define a new exception that adds such an
> information but it will cost a lot of CPU time because the exception
> cannot be cached to avoid the cost of initializing stack trace
> anymore.  Let me keep thinking about working around this issue.
>
> — Trustin Lee, http://gleamynode.net/
>
> On Wed, Apr 15, 2009 at 11:12 PM, Virat Gohil <virat4lug at gmail.com> wrote:
>> Hi Trustin,
>>
>> Thanks for the in depth explanation! :)
>>
>> I understand that changing the new behavior will cause "connection
>> reset by peer" on the client side, which is acceptable in our
>> scenario. Can I make a request for enhancement which will allow me to
>> control the behavior using some API(s) exposed by Netty? You can
>> provide it in some later release.
>>
>> Meanwhile, I will use the future.awaitUninterruptibly(miliseconds);
>> function as a workaround. This will lead to some connections closing
>> properly while others terminating.
>>
>> Thanks,
>>
>> Virat
>>
>> On Wed, Apr 15, 2009 at 7:30 PM, Trustin Lee <tlee at redhat.com> wrote:
>>> I realized that this issue can be resolved by closing the connection
>>> immediately after sending closure_notify (new behavior) instead of
>>> waiting for the client to respond with closure_notify (current
>>> behavior).  This is a legal behavior according to RFC2246 (7.2.1.
>>> Closure Alerts):
>>>
>>>   Each party is required to send a close_notify alert before closing
>>>   the write side of the connection. It is required that the other party
>>>   respond with a close_notify alert of its own and close down the
>>>   connection immediately, discarding any pending writes. It is not
>>>   required for the initiator of the close to wait for the responding
>>>   close_notify alert before closing the read side of the connection.
>>>
>>> However, doing this can cause a 'connection reset by peer' error on
>>> the client side because the client will try to respond to the
>>> closure_notify while the server closes the connection.
>>>
>>> Actually, there's nothing we can do with 'connection reset by peer'
>>> error and it is safe to ignore the exception.  The problem is that
>>> Netty cannot tell if a SocketException has been raised because of
>>> connection reset or not because simple string matcher will not work
>>> for localized messages.  It there is a definite way to detect the
>>> connection reset error then Netty could swallow it, which makes
>>> perfect sense.
>>>
>>> — Trustin Lee, http://gleamynode.net/
>>>
>>>
>>>
>>> On Wed, Apr 15, 2009 at 9:58 PM, Virat Gohil <virat4lug at gmail.com> wrote:
>>>> Hi Trustin,
>>>>
>>>> Thanks for the quick response!! :)
>>>>
>>>> I tried your suggestion on revision 1187, same result. I am willing to
>>>> test more suggestions :)
>>>>
>>>> Thanks,
>>>>
>>>> Virat
>>>>
>>>>
>>>> On Wed, Apr 15, 2009 at 6:04 PM, Trustin Lee <tlee at redhat.com> wrote:
>>>>> Please try Revision 1187 before making the modification I suggested.
>>>>> I've just checked in the potential fix for this issue.
>>>>>
>>>>> — Trustin Lee, http://gleamynode.net/
>>>>>
>>>>> On Wed, Apr 15, 2009 at 9:25 PM, Trustin Lee <tlee at redhat.com> wrote:
>>>>>> Hi Virat,
>>>>>>
>>>>>> On Wed, Apr 15, 2009 at 8:33 PM, Virat Gohil <virat.gohil at gmail.com> wrote:
>>>>>>> Hi All!
>>>>>>>
>>>>>>> I am facing a small problem shutting down my Netty based server with
>>>>>>> ~1200 connections.
>>>>>>>
>>>>>>> I am using the ChannelGroup as described in Getting Started guide,
>>>>>>> following is the code:
>>>>>>>
>>>>>>> public void stop()
>>>>>>>        {
>>>>>>>                if(timer!=null)
>>>>>>>                {
>>>>>>>                        timer.stop();
>>>>>>>                }
>>>>>>>                if(g!=null && factory!=null)
>>>>>>>                {
>>>>>>>                         ChannelGroupFuture future = g.close();
>>>>>>>                         future.awaitUninterruptibly();
>>>>>>>                        if(ch!=null)
>>>>>>>                        {
>>>>>>>                                ch.unbind();
>>>>>>>                        }
>>>>>>>                        try {
>>>>>>>                                bossExecutor.shutdownNow();
>>>>>>>                                workerExecutor.shutdownNow();
>>>>>>>                                workerExecutor.awaitTermination(3600, TimeUnit.SECONDS);
>>>>>>>                        bossExecutor.awaitTermination(3600, TimeUnit.SECONDS);
>>>>>>>                        } catch (InterruptedException e) {
>>>>>>>                                //print the exception
>>>>>>>                        }
>>>>>>>                        factory.releaseExternalResources();
>>>>>>>                }
>>>>>>>        }
>>>>>>>
>>>>>>> The execution gets stuck at future.awaitUninterruptibly(); I tried
>>>>>>> debugging the issue and found the following:
>>>>>>>
>>>>>>> 1. when g.close() is called the channel group creates a hashtable and
>>>>>>> creates a new DefaultChannelGroupFuture, which becomes the registered
>>>>>>> listener on all these channels.
>>>>>>> 2. Whenever channel.close() is called, the DefaultChannelGroupFuture
>>>>>>> gets called and increments the succes/failure count.
>>>>>>> 3. if the success+failure count=number of channels in the group, then
>>>>>>> the operation is considered finished and the thread waiting on the
>>>>>>> defaultchannelgroupfuture is released.
>>>>>>>
>>>>>>> I observed in SSLHandler that Channels.close() is called only if the
>>>>>>> received frame was empty and the inbound was finished:
>>>>>>> SSLHandler.java:406 (decode())
>>>>>>>       if (frame == null && engine.isInboundDone()) {
>>>>>>>            synchronized (closeFutures) {
>>>>>>>                for (;;) {
>>>>>>>                    ChannelFuture future = closeFutures.poll();
>>>>>>>                    if (future == null) {
>>>>>>>                        break;
>>>>>>>                    }
>>>>>>>                    Channels.close(ctx, future);
>>>>>>>                }
>>>>>>>            }
>>>>>>>        }
>>>>>>> Sometimes, either the frame is not null or inbound is not completed,
>>>>>>> this causes the SSLHandler to continue decoding.  This leads in
>>>>>>> DefaultChannelGroupFuture.childListener.operationComplete() being
>>>>>>> called after a very long time.
>>>>>>
>>>>>> 1) What is the state of the actual connection?  Is it closed or still connected?
>>>>>>
>>>>>> 2) What happens if you replace:
>>>>>>
>>>>>>        if (frame == null && engine.isInboundDone()) {
>>>>>> with:
>>>>>>
>>>>>>        if (frame == null && engine.isInboundDone() || !channel.isConnected()) {
>>>>>>
>>>>>> ?
>>>>>>
>>>>>>> What we would prefer to do, is to abandon the incomplete data in
>>>>>>> SSLHandler.decode() and close the channel immediately as soon as the
>>>>>>> server's shutdown method is called. Please let me know if I am missing
>>>>>>> something or if there is another way of achieving a faster shutdown.
>>>>>>
>>>>>> You are doing correctly and the ChannelGroupFuture should return
>>>>>> quickly.  Thanks for reporting the problem!
>>>>>>
>>>>>> Trustin
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> netty-users mailing list
>>>>> netty-users at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> netty-users mailing list
>>>> netty-users at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>>
>>>
>>> _______________________________________________
>>> netty-users mailing list
>>> netty-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>
>>
>> _______________________________________________
>> netty-users mailing list
>> netty-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/netty-users
>>
>