[jboss-jira] [JBoss JIRA] (JGRP-1910) MERGE3: Do not lose any members from view during a series of merges
Bela Ban (JIRA)
issues at jboss.org
Thu Mar 12 03:59:19 EDT 2015
[ https://issues.jboss.org/browse/JGRP-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049253#comment-13049253 ]
Bela Ban commented on JGRP-1910:
--------------------------------
Oops, the above algorithm doesn't work, e.g. in {{OverlappingMergeTest.testMergeWithDifferentPartitions()}}:
{noformat}
A: {A,C,B}
B: {A,C,B}
C: {A,C,B}
D: {B,A,C,D}
{noformat}
With the new algorithm, potential merge leaders are A and B. If we now sort \[A,B\] and B happens to be first, B will be the merge leader and A won't do anything. However, *B will neither ever become coordinator nor leave, so the merge will never get started !*
We therefore have to revert the new algorithm; merge leaders are always picked from actual senders of view-ids (that are view creators). In the above example, only A would be selected as potential merge leader.
> MERGE3: Do not lose any members from view during a series of merges
> -------------------------------------------------------------------
>
> Key: JGRP-1910
> URL: https://issues.jboss.org/browse/JGRP-1910
> Project: JGroups
> Issue Type: Bug
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 3.6.3
>
> Attachments: SplitMergeFailFastTest.java, SplitMergeTest.java
>
>
> When connection between nodes is re-established, MERGE3 should merge the cluster together. This often does not involve a single MergeView but a series of such events. The problematic property of this protocol is that some of those views can lack certain members, though these are reachable.
> This causes problem in Infinispan since the cache cannot be fully rebalanced before another merge arrives, and all owners of certain segment can be gradually removed (and added again) to the view, while this is not detected as partition but crashed nodes -> losing all owners means data loss.
> Removing members from view should be the role of FDx protocols, not MERGEx.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
More information about the jboss-jira
mailing list