Shutting down the netty server takes long time when using SSLHandler.

Thu Apr 16 04:53:11 EDT 2009

Hi Trustin,

Thank you very much for working on this issue with such dedication :)

I tested your fix using the SSLHandler.java revision number 1202. The
fix seems to be working fine. I will keep testing it for some more
rounds and see how it behaves.

Thanks!!

Virat

On Wed, Apr 15, 2009 at 9:01 PM, Trustin Lee <tlee at redhat.com> wrote:
> I found a workaround for the ClosedChannelException and checked the
> fix in, too.  It seems to work pretty well now.  However, I'm not sure
> if the new behavior will break existing non-Netty clients.  We will
> see..
>
> — Trustin Lee, http://gleamynode.net/
>
> On Thu, Apr 16, 2009 at 12:04 AM, Trustin Lee <tlee at redhat.com> wrote:
>> Hi Virat,
>>
>> I've just checked in the workaround for this issue.  SslHandler now
>> closes the connection immediately after sending the closure_notify and
>> swallows a connection reset by peer error if SSL session is closed.  I
>> made this change because otherwise SslHandler will be vulnerable to a
>> DoS attack.  An attacker would be able to prevent Netty from closing a
>> connection forever just by sending a partial SSL message.
>>
>> I used preliminary regular expression matcher to swallow the
>> exception, so it will not work for localized messages, but will work
>> for most systems.
>>
>> Another exception that can be raised rarely is ClosedChannelException.
>>  It can be raised when SslHandler attempts to send a closure_notify
>> after the connection is closed.  It is impossible to swallow this
>> exception unfortunately because ClosedChannelException doesn't convey
>> any information that tells which write request caused such an
>> exception.  I could define a new exception that adds such an
>> information but it will cost a lot of CPU time because the exception
>> cannot be cached to avoid the cost of initializing stack trace
>> anymore.  Let me keep thinking about working around this issue.
>>
>> — Trustin Lee, http://gleamynode.net/
>>
>> On Wed, Apr 15, 2009 at 11:12 PM, Virat Gohil <virat4lug at gmail.com> wrote:
>>> Hi Trustin,
>>>
>>> Thanks for the in depth explanation! :)
>>>
>>> I understand that changing the new behavior will cause "connection
>>> reset by peer" on the client side, which is acceptable in our
>>> scenario. Can I make a request for enhancement which will allow me to
>>> control the behavior using some API(s) exposed by Netty? You can
>>> provide it in some later release.
>>>
>>> Meanwhile, I will use the future.awaitUninterruptibly(miliseconds);
>>> function as a workaround. This will lead to some connections closing
>>> properly while others terminating.
>>>
>>> Thanks,
>>>
>>> Virat
>>>
>>> On Wed, Apr 15, 2009 at 7:30 PM, Trustin Lee <tlee at redhat.com> wrote:
>>>> I realized that this issue can be resolved by closing the connection
>>>> immediately after sending closure_notify (new behavior) instead of
>>>> waiting for the client to respond with closure_notify (current
>>>> behavior).  This is a legal behavior according to RFC2246 (7.2.1.
>>>> Closure Alerts):
>>>>
>>>>   Each party is required to send a close_notify alert before closing
>>>>   the write side of the connection. It is required that the other party
>>>>   respond with a close_notify alert of its own and close down the
>>>>   connection immediately, discarding any pending writes. It is not
>>>>   required for the initiator of the close to wait for the responding
>>>>   close_notify alert before closing the read side of the connection.
>>>>
>>>> However, doing this can cause a 'connection reset by peer' error on
>>>> the client side because the client will try to respond to the
>>>> closure_notify while the server closes the connection.
>>>>
>>>> Actually, there's nothing we can do with 'connection reset by peer'
>>>> error and it is safe to ignore the exception.  The problem is that
>>>> Netty cannot tell if a SocketException has been raised because of
>>>> connection reset or not because simple string matcher will not work
>>>> for localized messages.  It there is a definite way to detect the
>>>> connection reset error then Netty could swallow it, which makes
>>>> perfect sense.
>>>>
>>>> — Trustin Lee, http://gleamynode.net/
>>>>
>>>>
>>>>
>>>> On Wed, Apr 15, 2009 at 9:58 PM, Virat Gohil <virat4lug at gmail.com> wrote:
>>>>> Hi Trustin,
>>>>>
>>>>> Thanks for the quick response!! :)
>>>>>
>>>>> I tried your suggestion on revision 1187, same result. I am willing to
>>>>> test more suggestions :)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Virat
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2009 at 6:04 PM, Trustin Lee <tlee at redhat.com> wrote:
>>>>>> Please try Revision 1187 before making the modification I suggested.
>>>>>> I've just checked in the potential fix for this issue.
>>>>>>
>>>>>> — Trustin Lee, http://gleamynode.net/
>>>>>>
>>>>>> On Wed, Apr 15, 2009 at 9:25 PM, Trustin Lee <tlee at redhat.com> wrote:
>>>>>>> Hi Virat,
>>>>>>>
>>>>>>> On Wed, Apr 15, 2009 at 8:33 PM, Virat Gohil <virat.gohil at gmail.com> wrote:
>>>>>>>> Hi All!
>>>>>>>>
>>>>>>>> I am facing a small problem shutting down my Netty based server with
>>>>>>>> ~1200 connections.
>>>>>>>>
>>>>>>>> I am using the ChannelGroup as described in Getting Started guide,
>>>>>>>> following is the code:
>>>>>>>>
>>>>>>>> public void stop()
>>>>>>>>        {
>>>>>>>>                if(timer!=null)
>>>>>>>>                {
>>>>>>>>                        timer.stop();
>>>>>>>>                }
>>>>>>>>                if(g!=null && factory!=null)
>>>>>>>>                {
>>>>>>>>                         ChannelGroupFuture future = g.close();
>>>>>>>>                         future.awaitUninterruptibly();
>>>>>>>>                        if(ch!=null)
>>>>>>>>                        {
>>>>>>>>                                ch.unbind();
>>>>>>>>                        }
>>>>>>>>                        try {
>>>>>>>>                                bossExecutor.shutdownNow();
>>>>>>>>                                workerExecutor.shutdownNow();
>>>>>>>>                                workerExecutor.awaitTermination(3600, TimeUnit.SECONDS);
>>>>>>>>                        bossExecutor.awaitTermination(3600, TimeUnit.SECONDS);
>>>>>>>>                        } catch (InterruptedException e) {
>>>>>>>>                                //print the exception
>>>>>>>>                        }
>>>>>>>>                        factory.releaseExternalResources();
>>>>>>>>                }
>>>>>>>>        }
>>>>>>>>
>>>>>>>> The execution gets stuck at future.awaitUninterruptibly(); I tried
>>>>>>>> debugging the issue and found the following:
>>>>>>>>
>>>>>>>> 1. when g.close() is called the channel group creates a hashtable and
>>>>>>>> creates a new DefaultChannelGroupFuture, which becomes the registered
>>>>>>>> listener on all these channels.
>>>>>>>> 2. Whenever channel.close() is called, the DefaultChannelGroupFuture
>>>>>>>> gets called and increments the succes/failure count.
>>>>>>>> 3. if the success+failure count=number of channels in the group, then
>>>>>>>> the operation is considered finished and the thread waiting on the
>>>>>>>> defaultchannelgroupfuture is released.
>>>>>>>>
>>>>>>>> I observed in SSLHandler that Channels.close() is called only if the
>>>>>>>> received frame was empty and the inbound was finished:
>>>>>>>> SSLHandler.java:406 (decode())
>>>>>>>>       if (frame == null && engine.isInboundDone()) {
>>>>>>>>            synchronized (closeFutures) {
>>>>>>>>                for (;;) {
>>>>>>>>                    ChannelFuture future = closeFutures.poll();
>>>>>>>>                    if (future == null) {
>>>>>>>>                        break;
>>>>>>>>                    }
>>>>>>>>                    Channels.close(ctx, future);
>>>>>>>>                }
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>> Sometimes, either the frame is not null or inbound is not completed,
>>>>>>>> this causes the SSLHandler to continue decoding.  This leads in
>>>>>>>> DefaultChannelGroupFuture.childListener.operationComplete() being
>>>>>>>> called after a very long time.
>>>>>>>
>>>>>>> 1) What is the state of the actual connection?  Is it closed or still connected?
>>>>>>>
>>>>>>> 2) What happens if you replace:
>>>>>>>
>>>>>>>        if (frame == null && engine.isInboundDone()) {
>>>>>>> with:
>>>>>>>
>>>>>>>        if (frame == null && engine.isInboundDone() || !channel.isConnected()) {
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>>> What we would prefer to do, is to abandon the incomplete data in
>>>>>>>> SSLHandler.decode() and close the channel immediately as soon as the
>>>>>>>> server's shutdown method is called. Please let me know if I am missing
>>>>>>>> something or if there is another way of achieving a faster shutdown.
>>>>>>>
>>>>>>> You are doing correctly and the ChannelGroupFuture should return
>>>>>>> quickly.  Thanks for reporting the problem!
>>>>>>>
>>>>>>> Trustin
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> netty-users mailing list
>>>>>> netty-users at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> netty-users mailing list
>>>>> netty-users at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> netty-users mailing list
>>>> netty-users at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>>
>>>
>>> _______________________________________________
>>> netty-users mailing list
>>> netty-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>
>>
>
> _______________________________________________
> netty-users mailing list
> netty-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-users
>