[jboss-jira] [JBoss JIRA] (JGRP-1669) UDP should not stop message receiver thread after SocketException is caught
Bela Ban (JIRA)
jira-events at lists.jboss.org
Mon Jul 29 07:22:26 EDT 2013
[ https://issues.jboss.org/browse/JGRP-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793427#comment-12793427 ]
Bela Ban commented on JGRP-1669:
--------------------------------
In 3.4, the code looks as follows (used for unicast and multicast sockets):
{code}
public void run() {
final byte receive_buf[]=new byte[66000];
final DatagramPacket packet=new DatagramPacket(receive_buf, receive_buf.length);
while(thread != null && Thread.currentThread().equals(thread)) {
try {
receiver_socket.receive(packet);
...
}
catch(SocketException sock_ex) {
if(log.isDebugEnabled()) log.debug("receiver socket is closed, exception=" + sock_ex);
break;
}
catch(Throwable ex) {
if(log.isErrorEnabled())
log.error("failed receiving packet", ex);
}
}
if(log.isDebugEnabled()) log.debug(name + " thread terminated");
}
{code}
I'm going to change the code which handles the SocketException to only fall out of the loop if the socket is really closed. I *hope* there is no condition which causes the socket receive() to spit out endless exceptions while not being closed...
> UDP should not stop message receiver thread after SocketException is caught
> ---------------------------------------------------------------------------
>
> Key: JGRP-1669
> URL: https://issues.jboss.org/browse/JGRP-1669
> Project: JGroups
> Issue Type: Enhancement
> Affects Versions: 2.6.13
> Environment: Windows 2008 R2, Oracle JDK 1.6.0.24
> Reporter: Aleksandr Korostov
> Assignee: Bela Ban
> Fix For: 3.4
>
>
> One of our customers is getting the following error sporadically:
> {code}
> java.net.SocketException: socket closed
> at java.net.PlainDatagramSocketImpl.receive0(Native Method)
> at java.net.PlainDatagramSocketImpl.receive(Unknown Source)
> at java.net.DatagramSocket.receive(Unknown Source)
> at org.jgroups.protocols.UDP.run(UDP.java:262)
> at java.lang.Thread.run(Unknown Source)
> {code}
> (note that I modified UPD class to log the full stack trace because this stack trace was not logged by original version of UDP)
> I'm sure that socket was not closed by any Java code, the sender thread keeps running and sending messages via the same socket (if mcast_socket.close() had been called, the mcast_socket.send() would have thrown "socket closed" exception too but it did not).
> The main problem that this error stops the receiver thread so node stops receiving UDP messages from other nodes in the cluster.
> We cannot reproduce the error in our environment but I found cases of similar problem on the Internet:
> * https://forums.oracle.com/thread/2190450 (read the last post). It looks like Time-to-live-exceeded ICMP packet could force such an error on the multicast socket.
> * https://github.com/elasticsearch/elasticsearch/pull/2783 Here this problem was faced but the devs did not manage to find the root cause so they ended up implementing socket re-creation on error
> Given all that I think that UDP receiver thread should check mcast_socket.isClosed() before exiting. If socket is still open the thread should continue running (it may be safer to close the old socket and create a new one though)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list