[jboss-jira] [JBoss JIRA] (JGRP-1876) MERGE3 : Strange number and content of subgroups
Karim AMMOUS (JIRA)
issues at jboss.org
Thu Jan 22 07:02:49 EST 2015
[ https://issues.jboss.org/browse/JGRP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034323#comment-13034323 ]
Karim AMMOUS commented on JGRP-1876:
------------------------------------
I'm trying to address the original issue for which I created this issue.
*Issue description*
*T-0* : C is isolated from A and B. So two views are installed : {noformat}[A|1](2){A,B} on (A,B) and [C|0](1){C} on (C){noformat}
*T-1* : Stop isolation of C
*T-2* : C detects inconsistent views {noformat}([A|1] vs [C|0]){noformat} based only on MergeInfo coming from B and itself (C)
but without A's MergeInfo in spite of being a coord because it was sent while C was still isolated.
*T-3* : C decides to install new view {noformat}[C|A]{2){C,B}{noformat}. All members install C's view except A since it isn't in that view.
{noformat}
A : [A|1](2){A,B}
B : [C|A]{2){C,B}
C : [C|A]{2){C,B}
{noformat}
*T-4* : A detects that view are still incoherent (asymmetric). It decides to merge again. Resulting view: {noformat}[A|2](3){A,B,C}{noformat}
*Effects*
1/ When network problem (isolation, flapping, etc) occurs on one member, all other members (like B) may lose their coord.
2/ Two coord changes occurred : A -> C -> A
3/ C installs strange mergeview subgroups: {noformat} [A|1](1)(B), [A|1](1)[D], [A|1](1)[E], ..., [C|0](1)[C]. {noformat} (see enclosed file "MergeViewWith210Subgroups.log").
*Fix*
Only the third effect had been fixed thanks to your patch.
Please take a look at the [PR|https://github.com/belaban/JGroups/pull/200] that should fix the root cause.
Principle: If the merge leader didn't receive a MergeInfo from a real coord (present in a viewId), it checks the "age" of oldest viewId received.
If viewId is not really old (age < check_interval-min_interval), we interrupt merge process.
> MERGE3 : Strange number and content of subgroups
> ------------------------------------------------
>
> Key: JGRP-1876
> URL: https://issues.jboss.org/browse/JGRP-1876
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.4.2
> Reporter: Karim AMMOUS
> Assignee: Bela Ban
> Fix For: 3.5.1, 3.6, 3.6.2
>
> Attachments: 4Subgroups.zip, ChannelCreator.java, DkeJgrpAddress.java, JGRP-1876-1.pdf, karim-logs-files.zip, MergeTest4.java, MergeViewWith210Subgroups.log, SplitMergeTest.java, views.txt
>
>
> Using JGroups 3.4.2, a split occurred and a merge was processed successfully but number of subgroups is wrong (210 instead of 2).
> The final mergeView is correct and contains 210 members.
> Here is an extract of subviews:
> {code}
> INFO | Incoming-18,cluster,term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F] | [MyMembershipListener.java:126] | (middleware) | MergeView view ID = [serv-ZM2BU35940-58033:vt-14:192.168.55.55:1:CL(GROUP01)[F]|172]
> 210 subgroups
> [....
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [term-ETJ104215245-11092:host:192.168.56.72:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU38960-6907:asb:192.168.55.52:1:CL(GROUP01)[F]]
> [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]|171] (1) [term-ETJ101697729-31726:host:192.168.56.6:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU47533-55240:vt-14:192.168.55.57:1:CL(GROUP01)[F]]
> [term-ETJ100691812-36873:host:192.168.56.16:1:CL(GROUP01)[F]|170] (1) [serv-ZM2BU35943-49435:asb:192.168.55.51:1:CL(GROUP01)[F]]
> ....]
> {code}
> II wasn't able to reproduce that with a simple program. But I observed that merge was preceded by an ifdown/ifup on host 192.168.56.6. That member lost all others members, but it still present in their view.
> Example:
> {code}
> {A, B, C} => {A, B, C} and {C} => {A, B, C}
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
More information about the jboss-jira
mailing list