[jboss-jira] [JBoss JIRA] (JGRP-2092) MERGE3: merge never happens

Bela Ban (JIRA) issues at jboss.org
Wed Sep 14 09:59:00 EDT 2016


    [ https://issues.jboss.org/browse/JGRP-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293340#comment-13293340 ] 

Bela Ban commented on JGRP-2092:
--------------------------------

OK, after thinking long about this, below are the steps that can lead to a situation where there is no coordinator. The only case I found is caused by merging itself; regular leaves, joins or crashes cannot lead to such a scenario.

The cluster nodes are \{A,B,C,D\}. Now there's a split between \{A\} and \{B,C,D\}. The views are now:
{noformat}
A: A
B: BCD
C: BCD
D: BCD
{noformat}

The partition heals and a merge starts with merge participants A and B and merge leader B.

A and B agree on the new MergeView \{D,C,B,A\} (this is possibly as lexical UUID sorting is used by default) and A and B multicast that view in both of their respective partitions.

Let's say that D dropped the new view because its thread pool was full. Before B can retransmit the view, B crashes as well. The views are now:
{noformat}
A: DCBA
B: - // crashed
C: DCBA
D: BCD // old view
{noformat}

Now B is suspected and removed from all views:
{noformat}
A: DCA
C: DCA
D: CD // old view
{noformat}
 
Because B crashed, D will never get the correct view \{D,C,A\} in which it would be a coordinator and thus start a successful merge.

Therefore these steps lead to a scenario where a merge will never happen! Note that if a new member (e.g. E) joined as _singleton_ (view: \{E\}), then a successful merge would ensue as E would be the merge leader.

Although the above steps are an edge case the probability of which happening is very small, it this does happen, there's no way MERGE3 and GMS it their current state can resolve this, as a merge will never even start.

> MERGE3: merge never happens
> ---------------------------
>
>                 Key: JGRP-2092
>                 URL: https://issues.jboss.org/browse/JGRP-2092
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.6.11, 4.0
>
>         Attachments: jgroups.txt
>
>
> (Reported by Neal Dillman)
> In the case below, a merge doesn't seem to happen. Write a unit test to reprodue this.
> {noformat}
> Host A view: B, X, Y, Z, A  (where B should be coordinator)
> Host B view: C, Q, R, S, B (where C should be coordinator)
> Host C view: A, M, N, O, C (where A should be coordinator)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list