[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

Tue Sep 23 15:54:20 EDT 2008

    [ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12430862#action_12430862 ] 

Clebert Suconic commented on JBMESSAGING-1402:
----------------------------------------------

To reproduce this issue, you would need a panic event to happen, such as power down, network down.
Something that would keep the sockets alive.

All the other channels at AS & EAP are using both FD and FD_SOCK.

It would be nice to add a testcase for this, but this would require some manual steps to reproduce this. To automate it we would need some sort of virtual machines where we would be able to shut them down or some other things like that, but that goes beyond the scope of this task.

Since this has been extensively tested by JGroups guys, I will just accept this as being tested by JGropus.

> Node failure does not trigger failover for new nodes entering the cluster.
> --------------------------------------------------------------------------
>
>                 Key: JBMESSAGING-1402
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering
>    Affects Versions: 1.4.0.SP3.CP02
>            Reporter: Jay Howell
>            Assignee: Clebert Suconic
>             Fix For:  1.4.0.SP3.CP04, 1.4.1.GA
>
>
> When a node fails, users are reporting that the failure doesn't trigger a cluster failover.  When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start.
> Users are getting..
> 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state
> 2008-07-18 08:42:53,221 144588 INFO  [STDOUT] (UpHandler (MPING):)
> -------------------------------------------------------
> GMS: address is 69.52.50.155:7900
> -------------------------------------------------------
> 2008-07-18 08:42:56,251 147618 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> 2008-07-18 08:43:02,303 153670 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> 2008-07-18 08:43:10,791 162158 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> 2008-07-18 08:43:15,811 167178 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> 2008-07-18 08:43:20,832 172199 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> 2008-07-18 08:43:25,852 177219 WARN  [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying
> which occurs every 5 seconds and prevents the container from starting up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira