[jboss-jira] [JBoss JIRA] Commented: (JGRP-937) MERGE4: get rid of shunning and use only merging (getting rid of shunfests)
Bela Ban (JIRA)
jira-events at lists.jboss.org
Fri Apr 3 00:51:36 EDT 2009
[ https://jira.jboss.org/jira/browse/JGRP-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12460409#action_12460409 ]
Bela Ban commented on JGRP-937:
-------------------------------
DESIGN:
- Periodically compare discovery results with current view
- If there are new members:
- Compute partition coordinator(s)
- Start merge with partition coordinators
- Integration with STABLE ?
EXAMPLES:
- Partitions {A,B} and {C,D} (non-overlapping merge)
- A discovers new members {C,D}, C discovers {A,B}
- A contacts C to run a merge, C contacts A to run a merge, whoever started first, wins
- A is the merge coordinator, asks for view and digest info from C
- A fetches digests from A and B (multicast with bounded timeout)
- C fetches digests from C and D (multicast with bounded timeout)
- C sends this info to A
- A consolidates views and digests, unicasts new MergeView to itself and C
- A and C install the MergeView in their partitions
- C stops being the coordinator
- Partitions A: {A} and B,C: {A,B,C}
- B and C will not do anything because they're not coordinators
- A discovers B and C
- A determines that B is the (other) partition coordinator (besides itself)
- A asks B for merge info (view+digest) for B's partition
- B multicasts the request to its subgroup, gets responses from B, C *and A* (because it can reach A again !)
- B returns this merge info to A
- A consolidates the info from {A} and {A,B,C} into a new MergeView and consolidated digest
- A installs the new MergeView
> MERGE4: get rid of shunning and use only merging (getting rid of shunfests)
> ---------------------------------------------------------------------------
>
> Key: JGRP-937
> URL: https://jira.jboss.org/jira/browse/JGRP-937
> Project: JGroups
> Issue Type: Feature Request
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 2.8
>
> Attachments: udp-2.6.xml
>
>
> If we have FD_ALL plus shunning, the following scenario can happen:
> - A network partition with subgroups {A} and {B,C,D,E,F,G}
> - The partition heals
> - A gets heartbeats from all members of the 2nd subgroup
> - A's FD_ALL.shun will shun all members of the 2nd subgroup !
> - And vice versa, this leads to a shunfest and large clusters might never merge back again
> SOLUTION:
> - Get rid of shunning (GMS.shun is false by default anyway, now also set FD/FD_ALL.shun to false)
> - MERGE4 periodically compares discovery results to its view
> (- This might be done a few times)
> - Then MERGE4 initiates a merge between all members who have differing views
> - Make sure digests get merged correctly (min/max)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list