[jboss-jira] [JBoss JIRA] (JGRP-1876) MERGE3 : Strange number and content of subgroups

Bela Ban (JIRA) issues at jboss.org
Fri Jan 23 06:59:49 EST 2015


    [ https://issues.jboss.org/browse/JGRP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034581#comment-13034581 ] 

Bela Ban commented on JGRP-1876:
--------------------------------

I looked at the PR and I'm skeptical: it delays possible merges in favor of waiting for INFO messages *only* from coordinators. Consider the following scenario:
* Subgroups are A1=AB, C1=C, D1=DE, F1=FG, H1=HIJ
* C gets the INFO messages:
** From B: A1
** From C: C1 (itself)
** From D: D1
** From F: F1
** From H: H1
The information about view A1 was not sent by A, but by B. Your code will now simply return from the merge attempt, because of this one single INFO sent from B instead of A. This prevents a partial merge between views C1, D1, F1 and H1 !
This problem gets worse with increasing number of subclusters: the likelihood that at least one _non-coordinator_ sends an INFO message increases and this stops a potential partial merge.

> MERGE3 : Strange number and content of subgroups
> ------------------------------------------------
>
>                 Key: JGRP-1876
>                 URL: https://issues.jboss.org/browse/JGRP-1876
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4.2
>            Reporter: Karim AMMOUS
>            Assignee: Bela Ban
>             Fix For: 3.5.1, 3.6, 3.6.2
>
>         Attachments: 4Subgroups.zip, ChannelCreator.java, DkeJgrpAddress.java, JGRP-1876-1.pdf, karim-logs-files.zip, MergeTest4.java, MergeViewWith210Subgroups.log, SplitMergeTest.java, views.txt
>
>
> Using JGroups 3.4.2, a split occurred and a merge was processed successfully but number of subgroups is wrong (210 instead of 2).
> The final mergeView is correct and contains 210 members.
> Here is an extract of subviews: 
> {code}
> INFO | Incoming-18,cluster,term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F] | [MyMembershipListener.java:126] | (middleware) | MergeView view ID = [serv-ZM2BU35940-58033:vt-14:192.168.55.55:1:CL(GROUP01)[F]|172]
> 210 subgroups 
> [....
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [term-ETJ104215245-11092:host:192.168.56.72:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU38960-6907:asb:192.168.55.52:1:CL(GROUP01)[F]]
> [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]|171] (1) [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU47533-55240:vt-14:192.168.55.57:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU35943-49435:asb:192.168.55.51:1:CL(GROUP01)[F]]
> ....]
> {code}
> II wasn't able to reproduce that with a simple program. But I observed that merge was preceded by an ifdown/ifup on host 192.168.56.6. That member lost all others members, but it still present in their view.
> Example:  
> {code}
> {A, B, C} => {A, B, C} and {C} => {A, B, C}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the jboss-jira mailing list