[jboss-jira] [JBoss JIRA] Resolved: (JGRP-1179) Incoming PingRsp is ignored despite being sent by a Coordinator.

Bela Ban (JIRA) jira-events at lists.jboss.org
Thu Apr 1 08:03:38 EDT 2010


     [ https://jira.jboss.org/jira/browse/JGRP-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban resolved JGRP-1179.
----------------------------

    Resolution: Done


Fixed on head (2.10) and the 2.6 branch

> Incoming PingRsp is ignored despite being sent by a Coordinator.
> ----------------------------------------------------------------
>
>                 Key: JGRP-1179
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1179
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6.9, 2.6.14
>         Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18
>            Reporter: Renaud Devarieux
>            Assignee: Bela Ban
>             Fix For: 2.6.15, 2.10
>
>
> I launch successively (nearly simultaneously) 5 nodes A B C D E using the same protocol stack and one channel to communicate between themselves.
> UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=4):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS(shun=true):pbcast.FLUSH
> Often as not, it depends on the speed/rythm between each node launch, I get 2 views, ie {D} and  {A B C E}.
> Merge occurs later but when it does it's a bit late for my application and I don't think I should have to handle one save in case of a real electric/network failure.
> I noticed that on D I was timing out (3000ms) on during the discovery process despite having received the 4 GET_MBRS_RSP of the other nodes. Then D would decide there was no coordinator outside and become coordinator itself.
> What seems to happen is D sends two GET_MBRS_REQ and A replies to both, but at the time of the first reply, A is not yet coordinator and when D receives the second response, A became coordinator but D ignores the response and doesn"t add it to its list of Responses.
> I have written a workaround in Discovery.Responses method addResponse, it seems to work for my case but I am afraid it would break something else I am not aware of.
>         public void addResponse(PingRsp rsp) {
>             if(rsp == null)
>                 return;
>             promise.getLock().lock();
>             try {
>                 //Workaround 29/03/2010
>                 int index = ping_rsps.indexOf(rsp);
>                 // equivalent to does not contain.
>                 if (index == -1) {
>                     ping_rsps.add(rsp);
>                     promise.getCond().signalAll();
>                 } else if (rsp.isCoord()) {
>                     PingRsp pr = ping_rsps.get(index);
>                   
>                     //Check if the already existing element is not server
>                     if (!pr.isCoord()) {
>                         ping_rsps.set(index, rsp);
>                         promise.getCond().signalAll();
>                     }
>                 }
>                 /*if(!ping_rsps.contains(rsp)) {
>                     ping_rsps.add(rsp);
>                         promise.getCond().signalAll();
>                 }*/ // Old JGroups code
>             }
>             finally {
>                 promise.getLock().unlock();
>             }
>         }
> Regards
> Renaud

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list