[
https://issues.jboss.org/browse/JGRP-2092?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-2092:
--------------------------------
OK, after thinking long about this, below are the steps that can lead to a situation where
there is no coordinator. The only case I found is caused by merging itself; regular
leaves, joins or crashes cannot lead to such a scenario.
The cluster nodes are \{A,B,C,D\}. Now there's a split between \{A\} and \{B,C,D\}.
The views are now:
{noformat}
A: A
B: BCD
C: BCD
D: BCD
{noformat}
The partition heals and a merge starts with merge participants A and B and merge leader
B.
A and B agree on the new MergeView \{D,C,B,A\} (this is possibly as lexical UUID sorting
is used by default) and A and B multicast that view in both of their respective
partitions.
Let's say that D dropped the new view because its thread pool was full. Before B can
retransmit the view, B crashes as well. The views are now:
{noformat}
A: DCBA
B: - // crashed
C: DCBA
D: BCD // old view
{noformat}
Now B is suspected and removed from all views:
{noformat}
A: DCA
C: DCA
D: CD // old view
{noformat}
Because B crashed, D will never get the correct view \{D,C,A\} in which it would be a
coordinator and thus start a successful merge.
Therefore these steps lead to a scenario where a merge will never happen! Note that if a
new member (e.g. E) joined as _singleton_ (view: \{E\}), then a successful merge would
ensue as E would be the merge leader.
Although the above steps are an edge case the probability of which happening is very
small, it this does happen, there's no way MERGE3 and GMS it their current state can
resolve this, as a merge will never even start.
MERGE3: merge never happens
---------------------------
Key: JGRP-2092
URL:
https://issues.jboss.org/browse/JGRP-2092
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.6.11, 4.0
Attachments: jgroups.txt
(Reported by Neal Dillman)
In the case below, a merge doesn't seem to happen. Write a unit test to reprodue
this.
{noformat}
Host A view: B, X, Y, Z, A (where B should be coordinator)
Host B view: C, Q, R, S, B (where C should be coordinator)
Host C view: A, M, N, O, C (where A should be coordinator)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)