[jboss-jira] [JBoss JIRA] Resolved: (JGRP-1261) Get Socket closed exception, the nodes leaves the group and doesn't re-join anymore

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Dec 10 10:25:53 EST 2010


     [ https://issues.jboss.org/browse/JGRP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban resolved JGRP-1261.
----------------------------

    Resolution: Rejected


I suggest discuss issues on the JGroups mailing list first, before creating a JIRA issue. Plus, a reproduceable case would help to make people take a look

> Get Socket closed exception, the nodes leaves the group and doesn't re-join anymore
> -----------------------------------------------------------------------------------
>
>                 Key: JGRP-1261
>                 URL: https://issues.jboss.org/browse/JGRP-1261
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6.10
>         Environment: JBoss 5.1GA, RHEL 5.3
>            Reporter: Nick Semchenkov
>            Assignee: Bela Ban
>         Attachments: jgroups-channelfactory-stacks.xml
>
>
> The JBoss server works fine as a cluster member for a while and then all of a sudden gets "Socket closed" exception. After that its not a member of the cluster anymore.
> Please note, we have set shun=false trying to keep it in the cluster but it didnt help
> In the past we had JBoss 4.2 running on the same hardware and had no issues. So It's probably not a networking problem
> Please find below an excerpt from the cluster.log and attached jgroups config 
> cluster.log ------------------------------------------------
> 2010-12-10 09:16:14,160 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,161 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:16:14,548 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900)
> 2010-12-10 09:17:06,591 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.24:7901: java.net.SocketException: Socket closed
> 2010-12-10 09:17:06,628 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:47583 (own address=10.170.1.24:38487)
> 2010-12-10 09:17:06,638 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:47583 (number=0)
> 2010-12-10 09:17:06,640 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:7901 (own address=10.170.1.24:7900)
> 2010-12-10 09:17:06,644 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,645 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)
> 2010-12-10 09:17:06,646 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.170.1.24:39588 (number=0)
> 2010-12-10 09:17:06,852 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.25:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:07,960 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.17:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,113 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,167 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,176 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,177 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,177 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:54399; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,213 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,256 WARN  [org.jgroups.protocols.FD] I was suspected by 10.170.1.15:33585; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> 2010-12-10 09:17:08,297 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7901: java.net.SocketException: Broken pipe
> 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,309 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,310 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17
> 0.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,310 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399
> , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.170.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,311 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399, 10.17
> 0.1.15:33585, 10.170.1.24:39588]
> 2010-12-10 09:17:08,312 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:54399
> , 10.170.1.15:33585, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,312 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,314 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,314 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|11] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,315 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,315 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,316 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,317 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,313 DEBUG [org.jgroups.protocols.pbcast.FLUSH] Received START_FLUSH at 10.170.1.24:60948 but I am not flush participant, not responding
> 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,378 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.170.1.24:60948] view is [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585, 10.17
> 0.1.15:54399, 10.170.1.24:39588]
> 2010-12-10 09:17:08,379 WARN  [org.jgroups.protocols.pbcast.GMS] I (10.170.1.24:60948) am not a member of view [10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389, 10.170.1.15:33585
> , 10.170.1.15:54399, 10.170.1.24:39588]; discarding view
> 2010-12-10 09:17:08,379 DEBUG [org.jgroups.protocols.pbcast.FLUSH] At 10.170.1.24:60948 received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
> 2010-12-10 09:17:08,678 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.15:7900: java.net.SocketException: Socket closed
> 2010-12-10 09:17:08,877 ERROR [org.jgroups.blocks.ConnectionTable] failed sending data to 10.170.1.16:7901: java.net.SocketException: Socket closed
> 2010-12-10 09:17:12,126 WARN  [org.jgroups.protocols.VIEW_SYNC] discarding view as I (10.170.1.24:60948) am not member of view ([10.170.1.25:56405|12] [10.170.1.25:56405, 10.170.1.17:39656, 10.170.1.16:33984, 10.170.1.16:40389,
> 10.170.1.15:33585, 10.170.1.15:54399, 10.170.1.24:39588])
> 2010-12-10 09:17:12,648 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.170.1.24:39588 (own address=10.170.1.24:60948)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list