]
Bela Ban updated JGRP-1393:
---------------------------
Fix Version/s: 3.0.1
Optimization of concurrent joining to a non-existing cluster
------------------------------------------------------------
Key: JGRP-1393
URL:
https://issues.jboss.org/browse/JGRP-1393
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.0.1, 3.1
When we have no members running yet, and A, B, C and D join a cluster at exactly the same
time, the following can happen:
- A starts, sends a discovery request. B and C reply. A returns after N seconds with
responses from A, B and C.
- B starts, sends a discovery request. A and C reply. B returns after N seconds with
responses from A, B and C.
- C starts, sends a discovery request. A and B reply. C returns after N seconds with
responses from A, B and C
- D starts, sends a discovery request. A, B and C reply. C returns after N seconds with
responses from A, B, C and D
Responses are:
A: ABC
B: ABC
C: ABC
D: ABCD
Note that A, B and C don't have D's response.
The algorithm now has every member sort all of the responses, and pick the first as new
coordinator. Say we have the following sorted lists:
A: BAC
B: BAC
C: BAC
D: DBAC
The issue is now that B *and* D will become coordinator, and we have to have a merge to
establish the correct cluster membership.
The reason is that - apparently - A, B and C started a bit (we're talking 1-2 ms)
sooner than D, and so D didn't get their discovery requests, and thus didn't send
back a discovery response.
Even though D started a bit after A, B and C, the latter will still receive D's
discovery *request* (but not response). We can now take advantage of this and simply add
D's address to the discovery responses of every member when we receive D's
discovery *request*, in addition to D's *response*.
This will greatly reduce the chances of a merge having to be done as a result of
concurrent startup.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: