[
https://issues.jboss.org/browse/JGRP-1910?page=com.atlassian.jira.plugin....
]
Radim Vansa commented on JGRP-1910:
-----------------------------------
The ultimate goal is to 'not lose any members from view during a series of
merges', as JIRA title suggests, unless you get suspection from FD protocol. So, if
the messages can be lost/delayed, any algorithm that would base the view on a subset of
responses from the nodes (because we don't want to block for all responses and
implement guaranteed delivery) can lose some nodes in the new merge view. If a view that
is sent to a node is automatically installed there, you can't avoid the situation with
two concurrent merge views installed one after another, each missing particular members.
Therefore, the node has to reject some views, and I've suggested the criteria above.
MERGE3: Do not lose any members from view during a series of merges
-------------------------------------------------------------------
Key: JGRP-1910
URL:
https://issues.jboss.org/browse/JGRP-1910
Project: JGroups
Issue Type: Bug
Reporter: Radim Vansa
Assignee: Bela Ban
Fix For: 3.6.3
Attachments: SplitMergeFailFastTest.java, SplitMergeTest.java
When connection between nodes is re-established, MERGE3 should merge the cluster
together. This often does not involve a single MergeView but a series of such events. The
problematic property of this protocol is that some of those views can lack certain
members, though these are reachable.
This causes problem in Infinispan since the cache cannot be fully rebalanced before
another merge arrives, and all owners of certain segment can be gradually removed (and
added again) to the view, while this is not detected as partition but crashed nodes ->
losing all owners means data loss.
Removing members from view should be the role of FDx protocols, not MERGEx.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)