[JBoss JIRA] Created: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[JBoss JIRA] Created:...

Jay Howell (JIRA)

Monday, 21 July 2008 Mon, 21 Jul '08

10:04 a.m.

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Reporter: Jay Howell Assignee: Tim Fox When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Jay Howell (JIRA)

Monday, 21 July Mon, 21 Jul

10:04 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Jay Howell updated JBMESSAGING-1402: ------------------------------------ Component/s: JMS Clustering Affects Version/s: 1.4.0.SP3.CP02

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Tim Fox When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Jay Howell (JIRA)

10:30 a.m.

New subject: [JBoss JIRA] Assigned: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Jay Howell reassigned JBMESSAGING-1402: --------------------------------------- Assignee: Jay Howell (was: Tim Fox)

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Jay Howell (JIRA)

10:34 a.m.

New subject: [JBoss JIRA] Resolved: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Jay Howell resolved JBMESSAGING-1402. ------------------------------------- Fix Version/s: 1.4.0.SP3.CP04 Resolution: Done

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.0.SP3.CP04 When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Tim Fox (JIRA)

Saturday, 13 September Sat, 13 Sep

8:10 a.m.

New subject: [JBoss JIRA] Reopened: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Tim Fox reopened JBMESSAGING-1402: ----------------------------------

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.0.SP3.CP04 When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Tim Fox (JIRA)

8:14 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Tim Fox updated JBMESSAGING-1402: --------------------------------- Fix Version/s: 1.4.1.GA

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.0.SP3.CP04, 1.4.1.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

Tuesday, 23 September Tue, 23 Sep

12:12 p.m.

New subject: [JBoss JIRA] Assigned: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Clebert Suconic reassigned JBMESSAGING-1402: -------------------------------------------- Assignee: Clebert Suconic (was: Jay Howell)

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Clebert Suconic Fix For: 1.4.0.SP3.CP04, 1.4.1.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

2:28 p.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Clebert Suconic Fix For: 1.4.0.SP3.CP04, 1.4.1.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

2:54 p.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Clebert Suconic commented on JBMESSAGING-1402: ---------------------------------------------- To reproduce this issue, you would need a panic event to happen, such as power down, network down. Something that would keep the sockets alive. All the other channels at AS & EAP are using both FD and FD_SOCK. It would be nice to add a testcase for this, but this would require some manual steps to reproduce this. To automate it we would need some sort of virtual machines where we would be able to shut them down or some other things like that, but that goes beyond the scope of this task. Since this has been extensively tested by JGroups guys, I will just accept this as being tested by JGropus.

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Clebert Suconic Fix For: 1.4.0.SP3.CP04, 1.4.1.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

2:56 p.m.

New subject: [JBoss JIRA] Closed: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Clebert Suconic closed JBMESSAGING-1402. ---------------------------------------- Resolution: Done Assignee: Jay Howell (was: Clebert Suconic)

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.0.SP3.CP04, 1.4.1.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

Thursday, 6 November Thu, 6 Nov

8:17 p.m.

New subject: [JBoss JIRA] Closed: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.1.GA, 1.4.2.GA, 1.4.0.SP3.CP04 When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

Clebert Suconic (JIRA)

8:17 p.m.

New subject: [JBoss JIRA] Reopened: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1402?page=com.atlassian.ji... ] Clebert Suconic reopened JBMESSAGING-1402: ------------------------------------------

...

Node failure does not trigger failover for new nodes entering the cluster. -------------------------------------------------------------------------- Key: JBMESSAGING-1402 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1402 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP02 Reporter: Jay Howell Assignee: Jay Howell Fix For: 1.4.0.SP3.CP04, 1.4.1.GA, 1.4.2.GA When a node fails, users are reporting that the failure doesn't trigger a cluster failover. When a member starts back up and tries to join, it experiences a failure trying to reconnect to the downed node causing the container not to start. Users are getting.. 2008-07-18 08:42:53,211 144578 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] (main:) We are the first member of the group so no need to wait for state 2008-07-18 08:42:53,221 144588 INFO [STDOUT] (UpHandler (MPING):) ------------------------------------------------------- GMS: address is 69.52.50.155:7900 ------------------------------------------------------- 2008-07-18 08:42:56,251 147618 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:02,303 153670 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:10,791 162158 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:15,811 167178 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:20,832 172199 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying 2008-07-18 08:43:25,852 177219 WARN [org.jgroups.protocols.pbcast.GMS] (main:) join(69.52.50.155:7900) sent to 69.52.24.96:7900 timed out, retrying which occurs every 5 seconds and prevents the container from starting up.

5963

days inactive

6072

days old

jboss-jira@lists.jboss.org

Manage subscription

11 comments

3 participants

tags (0)

participants (3)

Clebert Suconic (JIRA)
Jay Howell (JIRA)
Tim Fox (JIRA)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JBMESSAGING-1402) Node failure does not trigger failover for new nodes entering the cluster.