[
https://jira.jboss.org/jira/browse/JGRP-937?page=com.atlassian.jira.plugi...
]
Bela Ban commented on JGRP-937:
-------------------------------
DESIGN:
- Periodically compare discovery results with current view
- If there are new members:
- Compute partition coordinator(s)
- Start merge with partition coordinators
- Integration with STABLE ?
EXAMPLES:
- Partitions {A,B} and {C,D} (non-overlapping merge)
- A discovers new members {C,D}, C discovers {A,B}
- A contacts C to run a merge, C contacts A to run a merge, whoever started first, wins
- A is the merge coordinator, asks for view and digest info from C
- A fetches digests from A and B (multicast with bounded timeout)
- C fetches digests from C and D (multicast with bounded timeout)
- C sends this info to A
- A consolidates views and digests, unicasts new MergeView to itself and C
- A and C install the MergeView in their partitions
- C stops being the coordinator
- Partitions A: {A} and B,C: {A,B,C}
- B and C will not do anything because they're not coordinators
- A discovers B and C
- A determines that B is the (other) partition coordinator (besides itself)
- A asks B for merge info (view+digest) for B's partition
- B multicasts the request to its subgroup, gets responses from B, C *and A* (because it
can reach A again !)
- B returns this merge info to A
- A consolidates the info from {A} and {A,B,C} into a new MergeView and consolidated
digest
- A installs the new MergeView
MERGE4: get rid of shunning and use only merging (getting rid of
shunfests)
---------------------------------------------------------------------------
Key: JGRP-937
URL:
https://jira.jboss.org/jira/browse/JGRP-937
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.8
Attachments: udp-2.6.xml
If we have FD_ALL plus shunning, the following scenario can happen:
- A network partition with subgroups {A} and {B,C,D,E,F,G}
- The partition heals
- A gets heartbeats from all members of the 2nd subgroup
- A's FD_ALL.shun will shun all members of the 2nd subgroup !
- And vice versa, this leads to a shunfest and large clusters might never merge back
again
SOLUTION:
- Get rid of shunning (GMS.shun is false by default anyway, now also set FD/FD_ALL.shun
to false)
- MERGE4 periodically compares discovery results to its view
(- This might be done a few times)
- Then MERGE4 initiates a merge between all members who have differing views
- Make sure digests get merged correctly (min/max)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira