[jboss-jira] [JBoss JIRA] (JGRP-1379) Make merging more scalable / robust

Wed Nov 2 10:59:45 EDT 2011

    [ https://issues.jboss.org/browse/JGRP-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639295#comment-12639295 ] 

Bela Ban edited comment on JGRP-1379 at 11/2/11 10:57 AM:
----------------------------------------------------------

We can reduce the number of merge responses using the algorithm below.

On a merge request from P:

#1 If we're the coordinator --> send a response
OR
#2 If P is the coordinator of our sub-cluster --> send a response
Else suppress the response

E.g. if we have sub-clusters {A,B,C,D} and {D,E,F,G}:
- A multicasts a discovery request
- B, C, and D see that the request's sender is A (the coord), so they reply
- D replies, too, because it is the coord of its own subcluster
- However, E, F and G don't reply as A is neither their coord, nor are they coords !

This is important if we have large clusters and many subclusters

      was (Author: belaban):
    We can reduce the number of merge responses using the algorithm below.

On a merge request from P:

#1 If we're the coordinator --> send a response
OR
#2 If P is the coordinator of our sub-cluster --> send a response
Else suppress the response

E.g. if we have sub-clusters {A,B,C,D} and {D,E,F,G}:
- A multicasts a discovery request
- B, C, and D see that the request's sender is A (the coord), so they reply
- D replies, too, because it is the coord of its own subcluster
- However, E, F and G don't reply as A is neither their coord, nor are they coords !

> Make merging more scalable / robust
> -----------------------------------
>
>                 Key: JGRP-1379
>                 URL: https://issues.jboss.org/browse/JGRP-1379
>             Project: JGroups
>          Issue Type: Task
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.0
>
>
> Make the MERGE2/GMS/Merger code more robust and scale better in large clusters.
> - While a merge is going on, stop sending out discovery requests. This reduces unnecessary traffic, especially in large clusters where discovery responses include the entire view of a sub-cluster
> - If we start a merge, or receive a MERGE-REQUEST, start a timer which cancels the merge after <merge_timeout *2> milliseconds. This is similar to the MergeKiller code, and prevents stale merges, e.g. by a crashed merge leader
> - If we have merge participants A,B,C,D,E but A only receives merge responses from itself, B and D, then don't cancel the merge, but instead proceed with merging A, B and D. This is currently not done, but a merge is cancelled when we don't get responses from every participant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira