Endless loop after shutdown

Trustin Lee (이희승) trustin at gmail.com
Mon Nov 16 00:34:12 EST 2009


Thomas told me additional information regarding this issue recently in
the IRC channel:

<draft> I'm having an endless loop when closing the channel. It
happens rarely and it gets stuck in the NioWorker.run(), which never
exits as there is always a key. The result is that the CPU usage is at
100% (caused by NioWorker.run) and the executor never terminates. I'm
using the latest 3.1.5
<draft> How can it be that selector has still one key after the
channel is closed and gets never removed. I think I had a similar
issue before and I thought that with the synchronized statement in
close, the problem was gone.
<draft> Hmm, probably my mistake... not all channels got closed.
<trustin> hmm.  please feel free to ping me when you find something :)
<draft> ok, I think the patch I send you for 3.1.3 is not useful at
all. I'll take a look at it..
<draft> in NioWorker, line 577 implicitly closes all keys from this
channel, so having the cancel and the close in a synchronized
statement does not make much sense. In an earlier mail, I wrote: "It
is important to first cancel all keys, then close the socket..." I
think this is not true, since closing the socket also cancels the key
and in the next select operation, the key is removed. The JavaDoc
says: "All of a channel's keys are cancelled implicitly when the
channel is closed, whether by invoking its close method or by
interrupting a thread blocked in an I/O operation upon the channel. "
So, I guess its safe to remove the synchronized statement and even the
key canceling.
<trustin> did you send the patch? nothing in my mailbox yet.  I'm
going to sleep, so please post your idea to netty-dev
<draft> ok, I'll have to check the mails, but I think I suggested to
put the synchronized statement there. It was some time ago. Anyway
I'll post to netty-dev. Good night.

To sum up:

1) There's no need to call SelectionKey.cancel() before calling Channel.close()
2) Hence, there's the proposed patch should be be unapplied.

I removed unnecessary SelectionKey.cancel() calls and the
synchronization block.  Please give the latest build a try and let me
know if you find any regression:

    http://hudson.jboss.org/hudson/view/Netty/job/netty/793/

Thanks

— Trustin Lee, http://gleamynode.net/



On Fri, Aug 28, 2009 at 8:11 PM, Thomas Bocek <bocek at ifi.uzh.ch> wrote:
> Hi Trustin,
>
> The problem went away! I run my tests a couple of times and the endless
> loop did not happen anymore in build #59. I also attached a patch for
> those classes to document why the synchronized statement is required
> there (no code touched, only added a comment). Please feel free to
> change the text.
>
> Regards,
>
> Thomas
>
> 이희승 (Trustin Lee) wrote:
>> Thanks Thomas for tracking down the root cause of the problem.  I've
>> just checked in the fix at revision 1673.
>>
>> https://jira.jboss.org/jira/browse/NETTY-214
>>
>> Please grab the snapshot from the CI server and let me know if the
>> problem went away:
>>
>> http://trustin.dyndns.org/hudson/job/netty-trunk-deploy/59/
>>
>> Trustin
>>
>> On Wed, 26 Aug 2009 20:31:05 +0900 (KST)
>> Thomas Bocek <bocek at ifi.uzh.ch> wrote:
>>> Hi Trustin,
>>>
>>> I have another issue with an endless loop after a shutdown which
>>> happens in very rare cases. I had a hard time to investigate this
>>> issue because everything runs fine in the debug mode, but not in the
>>> run mode, so I had to use lots of debug output and I'm not sure if my
>>> conclusion are correct. Anyway, I think here is what happens:
>>>
>>> The way to shutdown in Netty is to:
>>>
>>> 1) cancel all keys
>>> 2) close socket
>>>
>>> as seen in NioWorker and NioDatagramWorker, in the method static void
>>> close(NioSocketChannel channel, ChannelFuture future)
>>>
>>> If a socket is closed without canceling keys, the selectionkeys will
>>> still be in the keyset as far as I understand the Java API: "Keys may
>>> be cancelled and channels may be closed at any time. Hence the
>>> presence of a key in one or more of a selector's key sets does not
>>> imply that the key is valid or that its channel is open." It is
>>> important to first cancel all keys, then close the socket. If not all
>>> keys are canceled, but the socket is closed, the run() method of
>>> NioWorker runs forever and shutdown fails, because the shutdown flag
>>> is only set if the statement (selector.keys().isEmpty()) is true.
>>>
>>> In NioWorker.RegisterTask.run(...), the statement
>>> channel.socket.register(...) adds a key to the keyset and it only does
>>> it if the socket is not closed, so it has to be before 2). However, in
>>> very rare cases (e.g., canceling immediately after connection attempt)
>>> keys are added after 1) and before 2), which leads to the situation,
>>> that the socket is closed, but one key is still in the keyset and the
>>> shutdown fails:
>>>
>>> 1) cancel all keys
>>> *) channel.socket.register(...) -> add key
>>> 2) close socket
>>>
>>> I think the solution is to execute 1) and 2) in a synchronized
>>> statement. I used synchronized(channel.interestOpsLock) as this
>>> synchronization is also used for channel.socket.register(...) and I
>>> used it over 1) and 2), which seems to work (no more endless loops). I
>>> modified the close(...) method in NioWorker (I think NioDatagramWorker
>>> needs a fix too)
>>>
>>> before:
>>>
>>> ...
>>> SelectionKey key = channel.socket.keyFor(selector);
>>> if (key != null) {
>>>  key.cancel();
>>>  }
>>> boolean connected = channel.isConnected();
>>> boolean bound = channel.isBound();
>>> try {
>>>  channel.socket.close();
>>> ...
>>>
>>> after:
>>>
>>> ...
>>> boolean connected = channel.isConnected();
>>> boolean bound = channel.isBound();
>>> try {
>>>  synchronized (channel.interestOpsLock)
>>>  {
>>>   SelectionKey key = channel.socket.keyFor(selector);
>>>   if (key != null) {
>>>    key.cancel();
>>>   }
>>>   channel.socket.close();
>>>  }
>>> ...
>>>
>>> Thomas
>>> _______________________________________________
>>> netty-dev mailing list
>>> netty-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/netty-dev
>>
>>
>>
>
> _______________________________________________
> netty-dev mailing list
> netty-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-dev
>
>



More information about the netty-dev mailing list