[JBoss JIRA] Created: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[JBoss JIRA] Created:...

Renaud Devarieux (JIRA)

Monday, 29 March 2010 Mon, 29 Mar '10

11:11 a.m.

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.14, 2.6.9 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Bela Ban (JIRA)

Monday, 29 March Mon, 29 Mar

11:36 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Bela Ban updated JGRP-1179: --------------------------- Fix Version/s: 2.10 Let's see if this occurs in 2.10, too. If so, fix it in 2.10 and backport the fix to the 2.6 branch.

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Karthik Abram (JIRA)

Wednesday, 31 March Wed, 31 Mar

9:56 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Karthik Abram commented on JGRP-1179: ------------------------------------- I'm using 2.9 and I notice that with TCP and TCPPING configured, if I set TCPPING port_range to 5 it discovers the other members, but if I set it to 10 it doesn't. I also noticed the same behavior - GET_MBRS_RSP being passed between the nodes but the discovery at the end says 0 server, 0 clients, 0 coordinators. So this is a show-stopper for us now.

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Karthik Abram (JIRA)

10 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Karthik Abram commented on JGRP-1179: ------------------------------------- I meant 2.8 not 2.9

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Karthik Abram (JIRA)

10:06 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Karthik Abram commented on JGRP-1179: ------------------------------------- Another piece of pertinent information: The initial hosts are all part of the same physical server, they only differ by their port, not IP addr. Looking at the "public Object down(Event evt)" method of TCPPING, I wonder if it works properly only with different physical addresses.

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Renaud Devarieux (JIRA)

Thursday, 1 April Thu, 1 Apr

4:36 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Renaud Devarieux commented on JGRP-1179: ---------------------------------------- I have not been able to reproduce this issue using 2.10. I need yet to check if the control of the logical and physical address of the GET_MBRS_RSP allowing to overwrite is really the cause of the improving but I am confident it is. However I ran into other issues tied to PING/Discovery. It's close but perhaps worth another issue. Your call Bela. Basically, the Discovery sends up to n GET_MBRS_REQ to discover the members. Each GET_MBRS_REQ triggers a round of GET_MBRS_RSP which increase the initial_member count up to its limit in the Promise blocking the discovery. One round of GET_MBRS_RSP may not be sufficient to discover all the members, the second round of RSP then completes the count of the Promise, but depending on the order of reception of the RSP, the Promise condition may be signalled before all the RSP are processed, and those unprocessed RSP may belong to a Coordinator elected between the two REQ sent. => trouble. About TCPPING I am clueless, I haven't tried anything TCP with Jgroups.

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Bela Ban (JIRA)

5:28 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Bela Ban commented on JGRP-1179: -------------------------------- Renaud: I'm confident the issue is also present in 2.10. As a matter of fact, I applied your fix yesterday, but haven't committed the change yet. I'll run some tests today and commit it, plus I'll backport this to 2.6.x. Can you create a separate JIRA issue for the other issue you found ?

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Bela Ban (JIRA)

6:35 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Bela Ban updated JGRP-1179: --------------------------- Fix Version/s: 2.6.15

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.6.15, 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Bela Ban (JIRA)

7:03 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Bela Ban commented on JGRP-1179: -------------------------------- Karthik: your issue is unrelated to this issue, can you open a separate issue for this ?

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.6.15, 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Bela Ban (JIRA)

7:03 a.m.

New subject: [JBoss JIRA] Resolved: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

[ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plug... ] Bela Ban resolved JGRP-1179. ---------------------------- Resolution: Done Fixed on head (2.10) and the 2.6 branch

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.6.15, 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

Renaud Devarieux (JIRA)

Monday, 19 April Mon, 19 Apr

4:05 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

...

Incoming PingRsp is ignored despite being sent by a Coordinator. ---------------------------------------------------------------- Key: JGRP-1179 URL: https://jira.jboss.org/jira/browse/JGRP-1179 Project: JGroups Issue Type: Bug Affects Versions: 2.6.9, 2.6.14 Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18 Reporter: Renaud Devarieux Assignee: Bela Ban Fix For: 2.6.15, 2.10 I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves. UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and {A B C E}. Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure. I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself. What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses. I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of. public void addResponse(PingRsp rsp) { if(rsp == null) return; promise.getLock().lock(); try { //Workaround 29/03/2010 int index = ping_rsps.indexOf(rsp); // equivalent to does not contain. if (index == -1) { ping_rsps.add(rsp); promise.getCond().signalAll(); } else if (rsp.isCoord()) { PingRsp pr = ping_rsps.get(index); //Check if the already existing element is not server if (!pr.isCoord()) { ping_rsps.set(index, rsp); promise.getCond().signalAll(); } } /*if(!ping_rsps.contains(rsp)) { ping_rsps.add(rsp); promise.getCond().signalAll(); }*/ // Old JGroups code } finally { promise.getLock().unlock(); } } Regards Renaud

5875

days inactive

5896

days old

jboss-jira@lists.jboss.org

Manage subscription

10 comments

3 participants

tags (0)

participants (3)

Bela Ban (JIRA)
Karthik Abram (JIRA)
Renaud Devarieux (JIRA)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.