[jboss-jira] [JBoss JIRA] Commented: (JGRP-1261) Get Socket closed exception, the nodes leaves the group and doesn't re-join anymore
Bela Ban (JIRA)
jira-events at lists.jboss.org
Fri Dec 10 10:23:52 EST 2010
[ https://issues.jboss.org/browse/JGRP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569284#comment-12569284 ]
Bela Ban commented on JGRP-1261:
--------------------------------
This is a very imprecise description, just cut&pasting a log won't cut it. If you have something to reproduce this, I might be willing to take a look, although I don't really support 2.6 (see [1])...
[1] http://community.jboss.org/docs/DOC-14454
> Get Socket closed exception, the nodes leaves the group and doesn't re-join anymore
> -----------------------------------------------------------------------------------
>
> Key: JGRP-1261
> URL: https://issues.jboss.org/browse/JGRP-1261
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.6.10
> Environment: JBoss 5.1GA, RHEL 5.3
> Reporter: Nick Semchenkov
> Assignee: Bela Ban
> Attachments: jgroups-channelfactory-stacks.xml
>
>
> The JBoss server works fine as a cluster member for a while and then all of a sudden gets "Socket closed" exception. After that its not a member of the cluster anymore.
> Please note, we have set shun=false trying to keep it in the cluster but it didnt help
> In the past we had JBoss 4.2 running on the same hardware and had no issues. So It's probably not a networking problem
> Please find below an excerpt from the cluster.log and attached jgroups config
> cluster.log ------------------------------------------------
> 2010-12-10 09:16:14,160 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,548 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900)
> 2010-12-10 09:17:06,591 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.24:7901: java.net.SocketException: Socket closed
> 2010-12-10 09:17:06,628 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:47583 (own address=10.170.1.24:38487)
> 2010-12-10 09:17:06,638 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:47583 (number=0)
> 2010-12-10 09:17:06,640 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900)
> 2010-12-10 09:17:06,644 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,852 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.25:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:07,960 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.17:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,113 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,167 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,176 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,177 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,177 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,213 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,256 WARN [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,297 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7901: java.net.SocketException: Broken pipe
> 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17
> 0.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,310 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399
> , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17
> 0.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,312 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399
> , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,312 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,314 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,315 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,317 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,379 WARN [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,379 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,678 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,877 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7901: java.net.SocketException: Socket closed
> 2010-12-10 09:17:12,126 WARN [org.jgroups.protocols.VIEW_SYNC] discarding view as I (10.170.1.24:60948) am not member of view ([10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389,
> 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588])
> 2010-12-10 09:17:12,648 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list