[jboss-jira] [JBoss JIRA] Created: (JGRP-1182) GET_MBRS_RSP are not all processed, Discovery step ends prematurely.
Renaud Devarieux (JIRA)
jira-events at lists.jboss.org
Thu Apr 1 10:21:37 EDT 2010
GET_MBRS_RSP are not all processed, Discovery step ends prematurely.
--------------------------------------------------------------------
Key: JGRP-1182
URL: https://jira.jboss.org/jira/browse/JGRP-1182
Project: JGroups
Issue Type: Bug
Affects Versions: 2.6.14, 2.6.9, 2.10
Environment: Linux Red Hat Enterprise 5.0 kernel 2.6.18-8.el5 java 1.6.0_18
Reporter: Renaud Devarieux
Assignee: Bela Ban
I launch successively (nearly simultaneously) 5 nodes A B C D E on 5 hosts using the same protocol stack and one channel to communicate between themselves.
UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=5;timeout=800):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS:pbcast.FLUSH
Discovery sends up to n GET_MBRS_REQ to discover the members. Each GET_MBRS_REQ triggers a round of GET_MBRS_RSP which increase the initial_member count up to its limit in the Promise blocking the discovery. One round of GET_MBRS_RSP may not be sufficient to discover all the members, the second round of RSP then completes the count of the Promise, but depending on the order of reception of the RSP, the Promise condition may be signalled before all the RSP are processed, and those unprocessed RSP may belong to a Coordinator elected between the two REQ sent. => trouble.
exemple:
A B C D E are launched
...
D sends GET_MBRS_REQ
D receives 4 GET_MBRS_RSP from D A B C
A becomes coordinator
D sends GET_MBRS_REQ 400ms after the first
D receives B GET_MBRS_RSP
D receives E GET_MBRS_RSP and meets the discovery initial_members. Discovery ends in 428ms
D receives A GET_MBRS_RSP A is coordinator but it's too late, it won't be counted in the set of responses
D becomes coordinator.
We have two coordinator.
It may happen also if E is quicker and is part of the first round of RSP.
I am not sure yet of how to solve this problem. Obviously D should have been warned A was becoming coordinator or A was trying to at least.
Perhaps if all the GET_MBRS traffic was multicast, each new member could spy it and try according the different REQ and RSP message find who is doing what.
I'd see well discovery split in two phase, on phase where a new member would "silently" listen to the network then actively try to discover the other member with several GET_MBRS_REQ.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list