[jboss-jira] [JBoss JIRA] (JGRP-1876) MERGE3 : Strange number and content of subgroups

Karim AMMOUS (JIRA) issues at jboss.org
Fri Jan 23 09:53:49 EST 2015


    [ https://issues.jboss.org/browse/JGRP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034645#comment-13034645 ] 

Karim AMMOUS commented on JGRP-1876:
------------------------------------

By Effects, I means consequences.

1/ Yes. But that temporary merge causes member lost, moreover, a coord.
2/ Not just an optimization. 
At B's side: It lost its coord A in spite of being available and sending periodically a heartbeat.
In configuration, we added failure detection protocol (FD_ALL), configured "timeout = 3,5 * interval" and added VERIFY_SUSPSECT to be sure that a member is excluded only if it's real dead.

Recently you added to jgroups the ability to implement coord election to avoid coord change and bring more stability to a cluster ([JGRP-592|https://issues.jboss.org/browse/JGRP-592] and [MembershipChangePolicy|http://www.jgroups.org/manual/html/user-advanced.html#MembershipChangePolicy]). The case discussed here shows that merge process goes counter that policy: coord lost and double coord changes.

> MERGE3 : Strange number and content of subgroups
> ------------------------------------------------
>
>                 Key: JGRP-1876
>                 URL: https://issues.jboss.org/browse/JGRP-1876
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4.2
>            Reporter: Karim AMMOUS
>            Assignee: Bela Ban
>             Fix For: 3.5.1, 3.6, 3.6.2
>
>         Attachments: 4Subgroups.zip, ChannelCreator.java, DkeJgrpAddress.java, JGRP-1876-1.pdf, karim-logs-files.zip, MergeTest4.java, MergeViewWith210Subgroups.log, SplitMergeTest.java, views.txt
>
>
> Using JGroups 3.4.2, a split occurred and a merge was processed successfully but number of subgroups is wrong (210 instead of 2).
> The final mergeView is correct and contains 210 members.
> Here is an extract of subviews: 
> {code}
> INFO | Incoming-18,cluster,term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F] | [MyMembershipListener.java:126] | (middleware) | MergeView view ID = [serv-ZM2BU35940-58033:vt-14:192.168.55.55:1:CL(GROUP01)[F]|172]
> 210 subgroups 
> [....
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [term-ETJ104215245-11092:host:192.168.56.72:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU38960-6907:asb:192.168.55.52:1:CL(GROUP01)[F]]
> [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]|171] (1) [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU47533-55240:vt-14:192.168.55.57:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU35943-49435:asb:192.168.55.51:1:CL(GROUP01)[F]]
> ....]
> {code}
> II wasn't able to reproduce that with a simple program. But I observed that merge was preceded by an ifdown/ifup on host 192.168.56.6. That member lost all others members, but it still present in their view.
> Example:  
> {code}
> {A, B, C} => {A, B, C} and {C} => {A, B, C}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the jboss-jira mailing list