[
https://issues.jboss.org/browse/JGRP-1910?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-1910:
--------------------------------
OK, so the problem is this: if we have a split like the one below:
{noformat}
0: [0|9] (3) [0, 2, 1]
1: [0|9] (3) [0, 2, 1]
2: [0|9] (3) [0, 2, 1]
3: [3|9] (2) [3, 4]
4: [3|9] (2) [3, 4]
5: [5|10] (3) [5, 6, 7]
6: [5|10] (3) [5, 6, 7]
7: [5|10] (3) [5, 6, 7]
8: [9|9] (2) [9, 8]
9: [9|9] (2) [9, 8]
{noformat}
, members send {{INFO}} messages (in {{MERGE3}}) with the view-id, e.g. 7 sends
{{\[5|10\]}}.
Every member multicasts an {{INFO}} message at a random interval
{{\[MERGE3.min_interval..MERGE3_max_interval\]}}.
The coordinators of the subclusters (0, 3, 5 and 9) check every {MERGE3.check_interval}}
ms if they've received different view-ids and initiate a merge if so.
The problem with your test is that it drops {{INFO}} messages (not retransmitted as
{{MERGE3}} is below {{UNICASTX}} and {{NAKACKX}}, so when a coordinator starts the view-id
check, it may not have received all view-ids.
Example:
{noformat}
5: I will be the merge leader. Starting the merge task. Coords: [5], views: {5=[5|10] (3)
[5, 6, 7], 1=[0|9] (3) [0, 2, 1]}
9: I will be the merge leader. Starting the merge task. Coords: [9, 3], views: {4=[3|9]
(2) [3, 4], 9=[9|9] (2) [9, 8], 3=[3|9] (2) [3, 4], 7=[5|11] (6) [5, 0, 2, 1, 6, 7]}
{noformat}
Here, we can see that 5 and 9 were merge leaders and started merges at almost the same
time. Depending on which merge leader is able to contact more members, there will be more
than 1 merge needed to end up with a fully merged view.
E.g if 5 succeeds in contacting itself, 3 and 0 *before 9 contacts them* , then a merge
will ensue including 0, 3 and 5. 9 will not merge with anyone else, as it started a merge
on its own and therefore rejects 5's merge request.
The problem can be mitigated (but not eliminated altogether) by reducing {{min_interval}}
and {{max_interval}} and increasing {{check_interval}} in {{MERGE3}}: this way,
coordinators are more likely to get more {{INFO}} messages from everyone (despite the
message drops) and this reduces the chances of multiple merge leaders being chosen.
MERGE3: Do not lose any members from view during a series of merges
-------------------------------------------------------------------
Key: JGRP-1910
URL:
https://issues.jboss.org/browse/JGRP-1910
Project: JGroups
Issue Type: Bug
Reporter: Radim Vansa
Assignee: Bela Ban
Fix For: 3.6.3
Attachments: SplitMergeFailFastTest.java, SplitMergeTest.java
When connection between nodes is re-established, MERGE3 should merge the cluster
together. This often does not involve a single MergeView but a series of such events. The
problematic property of this protocol is that some of those views can lack certain
members, though these are reachable.
This causes problem in Infinispan since the cache cannot be fully rebalanced before
another merge arrives, and all owners of certain segment can be gradually removed (and
added again) to the view, while this is not detected as partition but crashed nodes ->
losing all owners means data loss.
Removing members from view should be the role of FDx protocols, not MERGEx.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)