[jboss-jira] [JBoss JIRA] (JGRP-1910) MERGE3: Do not lose any members from view during a series of merges

Wed Mar 11 08:43:19 EDT 2015

    [ https://issues.jboss.org/browse/JGRP-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048898#comment-13048898 ] 

Bela Ban commented on JGRP-1910:
--------------------------------

OK, so the problem is this: if we have a split like the one below:
{noformat}
0: [0|9] (3) [0, 2, 1]
1: [0|9] (3) [0, 2, 1]
2: [0|9] (3) [0, 2, 1]
3: [3|9] (2) [3, 4]
4: [3|9] (2) [3, 4]
5: [5|10] (3) [5, 6, 7]
6: [5|10] (3) [5, 6, 7]
7: [5|10] (3) [5, 6, 7]
8: [9|9] (2) [9, 8]
9: [9|9] (2) [9, 8]
{noformat}

, members send {{INFO}} messages (in {{MERGE3}}) with the view-id, e.g. 7 sends {{\[5|10\]}}.

Every member multicasts an {{INFO}} message at a random interval {{\[MERGE3.min_interval..MERGE3_max_interval\]}}.
The coordinators of the subclusters (0, 3, 5 and 9) check every {MERGE3.check_interval}} ms if they've received different view-ids and initiate a merge if so.

The problem with your test is that it drops {{INFO}} messages (not retransmitted as {{MERGE3}} is below {{UNICASTX}} and {{NAKACKX}}, so when a coordinator starts the view-id check, it may not have received all view-ids.

Example:
{noformat}
5: I will be the merge leader. Starting the merge task. Coords: [5], views: {5=[5|10] (3) [5, 6, 7], 1=[0|9] (3) [0, 2, 1]}
9: I will be the merge leader. Starting the merge task. Coords: [9, 3], views: {4=[3|9] (2) [3, 4], 9=[9|9] (2) [9, 8], 3=[3|9] (2) [3, 4], 7=[5|11] (6) [5, 0, 2, 1, 6, 7]}
{noformat}

Here, we can see that 5 and 9 were merge leaders and started merges at almost the same time. Depending on which merge leader is able to contact more members, there will be more than 1 merge needed to end up with a fully merged view.
E.g if 5 succeeds in contacting itself, 3 and 0 *before 9 contacts them* , then a merge will ensue including 0, 3 and 5. 9 will not merge with anyone else, as it started a merge on its own and therefore rejects 5's merge request.

The problem can be mitigated (but not eliminated altogether) by reducing {{min_interval}} and {{max_interval}} and increasing {{check_interval}} in {{MERGE3}}: this way, coordinators are more likely to get more {{INFO}} messages from everyone (despite the message drops) and this reduces the chances of multiple merge leaders being chosen.

> MERGE3: Do not lose any members from view during a series of merges
> -------------------------------------------------------------------
>
>                 Key: JGRP-1910
>                 URL: https://issues.jboss.org/browse/JGRP-1910
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 3.6.3
>
>         Attachments: SplitMergeFailFastTest.java, SplitMergeTest.java
>
>
> When connection between nodes is re-established, MERGE3 should merge the cluster together. This often does not involve a single MergeView but a series of such events. The problematic property of this protocol is that some of those views can lack certain members, though these are reachable.
> This causes problem in Infinispan since the cache cannot be fully rebalanced before another merge arrives, and all owners of certain segment can be gradually removed (and added again) to the view, while this is not detected as partition but crashed nodes -> losing all owners means data loss.
> Removing members from view should be the role of FDx protocols, not MERGEx.

--
This message was sent by Atlassian JIRA
(v6.3.11#6341)