[jboss-jira] [JBoss JIRA] Commented: (JGRP-937) MERGE4: get rid of shunning and use only merging (getting rid of shunfests)

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Apr 3 00:51:36 EDT 2009


    [ https://jira.jboss.org/jira/browse/JGRP-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12460409#action_12460409 ] 

Bela Ban commented on JGRP-937:
-------------------------------

DESIGN:

- Periodically compare discovery results with current view
- If there are new members:
  - Compute partition coordinator(s)
  - Start merge with partition coordinators

- Integration with STABLE ?


EXAMPLES:

- Partitions {A,B} and {C,D} (non-overlapping merge)
- A discovers new members {C,D}, C discovers {A,B}
- A contacts C to run a merge, C contacts A to run a merge, whoever started first, wins
- A is the merge coordinator, asks for view and digest info from C
- A fetches digests from A and B (multicast with bounded timeout)
- C fetches digests from C and D (multicast with bounded timeout)
- C sends this info to A
- A consolidates views and digests, unicasts new MergeView to itself and C
- A and C install the MergeView in their partitions
- C stops being the coordinator


- Partitions A: {A} and B,C: {A,B,C}
- B and C will not do anything because they're not coordinators
- A discovers B and C
- A determines that B is the (other) partition coordinator (besides itself)
- A asks B for merge info (view+digest) for B's partition
- B multicasts the request to its subgroup, gets responses from B, C *and A* (because it can reach A again !)
- B returns this merge info to A
- A consolidates the info from {A} and {A,B,C} into a new MergeView and consolidated digest
- A installs the new MergeView



> MERGE4: get rid of shunning and use only merging (getting rid of shunfests)
> ---------------------------------------------------------------------------
>
>                 Key: JGRP-937
>                 URL: https://jira.jboss.org/jira/browse/JGRP-937
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.8
>
>         Attachments: udp-2.6.xml
>
>
> If we have FD_ALL plus shunning, the following scenario can happen:
> - A network partition with subgroups {A} and {B,C,D,E,F,G}
> - The partition heals
> - A gets heartbeats from all members of the 2nd subgroup
> - A's FD_ALL.shun will shun all members of the 2nd subgroup !
> - And vice versa, this leads to a shunfest and large clusters might never merge back again
> SOLUTION:
> - Get rid of shunning (GMS.shun is false by default anyway, now also set FD/FD_ALL.shun to false)
> - MERGE4 periodically compares discovery results to its view
> (- This might be done a few times)
> - Then MERGE4 initiates a merge between all members who have differing views
> - Make sure digests get merged correctly (min/max)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list