[jboss-jira] [JBoss JIRA] (JGRP-1669) UDP should not stop message receiver thread after SocketException is caught

Aleksandr Korostov (JIRA) jira-events at lists.jboss.org
Fri Jul 26 18:33:26 EDT 2013


Aleksandr Korostov created JGRP-1669:
----------------------------------------

             Summary: UDP should not stop message receiver thread after SocketException is caught
                 Key: JGRP-1669
                 URL: https://issues.jboss.org/browse/JGRP-1669
             Project: JGroups
          Issue Type: Enhancement
    Affects Versions: 2.6.13
            Reporter: Aleksandr Korostov
            Assignee: Bela Ban


One of our customers is getting the following error sporadically:
{code}
java.net.SocketException: socket closed
        at java.net.PlainDatagramSocketImpl.receive0(Native Method)
        at java.net.PlainDatagramSocketImpl.receive(Unknown Source)
        at java.net.DatagramSocket.receive(Unknown Source)
        at org.jgroups.protocols.UDP.run(UDP.java:262)
        at java.lang.Thread.run(Unknown Source)
{code}

I'm sure that socket was not closed by any Java code, the sender thread keeps running and sending messages via the same socket (if mcast_socket.close() had been called, the mcast_socket.send() would have thrown "socket closed" exception too but it did not).

The main problem that this error stops the receiver thread so node stops receiving UDP messages from other nodes in the cluster.

We cannot reproduce the error in our environment but I found cases of similar problem on the Internet:

* https://forums.oracle.com/thread/2190450 (read the last post). It looks like Time-to-live-exceeded ICMP packet could force such an error on the multicast socket. 
* https://github.com/elasticsearch/elasticsearch/pull/2783 Here this problem was faced but the devs did not manage to find the root cause so they ended up implementing socket re-creation on error

Given all that I think that UDP receiver thread should check mcast_socket.isClosed() before exiting. If socket is still open the thread should continue running (it may be safer to close the old socket and create a new one though)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list