[jboss-jira] [JBoss JIRA] (JGRP-1669) UDP should not stop message receiver thread after SocketException is caught
Aleksandr Korostov (JIRA)
jira-events at lists.jboss.org
Thu Aug 1 16:17:26 EDT 2013
[ https://issues.jboss.org/browse/JGRP-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794675#comment-12794675 ]
Aleksandr Korostov commented on JGRP-1669:
------------------------------------------
Is it possible to make the similar fix in 2.x release (e.g. make a 2.12.3 with that fix)?
> UDP should not stop message receiver thread after SocketException is caught
> ---------------------------------------------------------------------------
>
> Key: JGRP-1669
> URL: https://issues.jboss.org/browse/JGRP-1669
> Project: JGroups
> Issue Type: Enhancement
> Affects Versions: 2.6.13
> Environment: Windows 2008 R2, Oracle JDK 1.6.0.24
> Reporter: Aleksandr Korostov
> Assignee: Bela Ban
> Fix For: 3.4
>
>
> One of our customers is getting the following error sporadically:
> {code}
> java.net.SocketException: socket closed
> at java.net.PlainDatagramSocketImpl.receive0(Native Method)
> at java.net.PlainDatagramSocketImpl.receive(Unknown Source)
> at java.net.DatagramSocket.receive(Unknown Source)
> at org.jgroups.protocols.UDP.run(UDP.java:262)
> at java.lang.Thread.run(Unknown Source)
> {code}
> (note that I modified UPD class to log the full stack trace because this stack trace was not logged by original version of UDP)
> I'm sure that socket was not closed by any Java code, the sender thread keeps running and sending messages via the same socket (if mcast_socket.close() had been called, the mcast_socket.send() would have thrown "socket closed" exception too but it did not).
> The main problem that this error stops the receiver thread so node stops receiving UDP messages from other nodes in the cluster.
> We cannot reproduce the error in our environment but I found cases of similar problem on the Internet:
> * https://forums.oracle.com/thread/2190450 (read the last post). It looks like Time-to-live-exceeded ICMP packet could force such an error on the multicast socket.
> * https://github.com/elasticsearch/elasticsearch/pull/2783 Here this problem was faced but the devs did not manage to find the root cause so they ended up implementing socket re-creation on error
> Given all that I think that UDP receiver thread should check mcast_socket.isClosed() before exiting. If socket is still open the thread should continue running (it may be safer to close the old socket and create a new one though)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list