[jboss-jira] [JBoss JIRA] (JGRP-1379) Make merging more scalable / robust
Bela Ban (Issue Comment Edited) (JIRA)
jira-events at lists.jboss.org
Wed Nov 2 10:59:45 EDT 2011
[ https://issues.jboss.org/browse/JGRP-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639295#comment-12639295 ]
Bela Ban edited comment on JGRP-1379 at 11/2/11 10:57 AM:
----------------------------------------------------------
We can reduce the number of merge responses using the algorithm below.
On a merge request from P:
#1 If we're the coordinator --> send a response
OR
#2 If P is the coordinator of our sub-cluster --> send a response
Else suppress the response
E.g. if we have sub-clusters {A,B,C,D} and {D,E,F,G}:
- A multicasts a discovery request
- B, C, and D see that the request's sender is A (the coord), so they reply
- D replies, too, because it is the coord of its own subcluster
- However, E, F and G don't reply as A is neither their coord, nor are they coords !
This is important if we have large clusters and many subclusters
was (Author: belaban):
We can reduce the number of merge responses using the algorithm below.
On a merge request from P:
#1 If we're the coordinator --> send a response
OR
#2 If P is the coordinator of our sub-cluster --> send a response
Else suppress the response
E.g. if we have sub-clusters {A,B,C,D} and {D,E,F,G}:
- A multicasts a discovery request
- B, C, and D see that the request's sender is A (the coord), so they reply
- D replies, too, because it is the coord of its own subcluster
- However, E, F and G don't reply as A is neither their coord, nor are they coords !
> Make merging more scalable / robust
> -----------------------------------
>
> Key: JGRP-1379
> URL: https://issues.jboss.org/browse/JGRP-1379
> Project: JGroups
> Issue Type: Task
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.0
>
>
> Make the MERGE2/GMS/Merger code more robust and scale better in large clusters.
> - While a merge is going on, stop sending out discovery requests. This reduces unnecessary traffic, especially in large clusters where discovery responses include the entire view of a sub-cluster
> - If we start a merge, or receive a MERGE-REQUEST, start a timer which cancels the merge after <merge_timeout *2> milliseconds. This is similar to the MergeKiller code, and prevents stale merges, e.g. by a crashed merge leader
> - If we have merge participants A,B,C,D,E but A only receives merge responses from itself, B and D, then don't cancel the merge, but instead proceed with merging A, B and D. This is currently not done, but a merge is cancelled when we don't get responses from every participant.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list