[jboss-jira] [JBoss JIRA] (JGRP-1669) UDP should not stop message receiver thread after SocketException is caught

Aleksandr Korostov (JIRA) jira-events at lists.jboss.org
Thu Aug 1 16:17:26 EDT 2013


    [ https://issues.jboss.org/browse/JGRP-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794675#comment-12794675 ] 

Aleksandr Korostov commented on JGRP-1669:
------------------------------------------

Is it possible to make the similar fix in 2.x release (e.g. make a 2.12.3 with that fix)?
                
> UDP should not stop message receiver thread after SocketException is caught
> ---------------------------------------------------------------------------
>
>                 Key: JGRP-1669
>                 URL: https://issues.jboss.org/browse/JGRP-1669
>             Project: JGroups
>          Issue Type: Enhancement
>    Affects Versions: 2.6.13
>         Environment: Windows 2008 R2, Oracle JDK 1.6.0.24
>            Reporter: Aleksandr Korostov
>            Assignee: Bela Ban
>             Fix For: 3.4
>
>
> One of our customers is getting the following error sporadically:
> {code}
> java.net.SocketException: socket closed
>         at java.net.PlainDatagramSocketImpl.receive0(Native Method)
>         at java.net.PlainDatagramSocketImpl.receive(Unknown Source)
>         at java.net.DatagramSocket.receive(Unknown Source)
>         at org.jgroups.protocols.UDP.run(UDP.java:262)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> (note that I modified UPD class to log the full stack trace because this stack trace was not logged by original version of UDP)
> I'm sure that socket was not closed by any Java code, the sender thread keeps running and sending messages via the same socket (if mcast_socket.close() had been called, the mcast_socket.send() would have thrown "socket closed" exception too but it did not).
> The main problem that this error stops the receiver thread so node stops receiving UDP messages from other nodes in the cluster.
> We cannot reproduce the error in our environment but I found cases of similar problem on the Internet:
> * https://forums.oracle.com/thread/2190450 (read the last post). It looks like Time-to-live-exceeded ICMP packet could force such an error on the multicast socket. 
> * https://github.com/elasticsearch/elasticsearch/pull/2783 Here this problem was faced but the devs did not manage to find the root cause so they ended up implementing socket re-creation on error
> Given all that I think that UDP receiver thread should check mcast_socket.isClosed() before exiting. If socket is still open the thread should continue running (it may be safer to close the old socket and create a new one though)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list