[jboss-jira] [JBoss JIRA] (JGRP-1705) MERGE3 handles fake merge views
Bela Ban (JIRA)
jira-events at lists.jboss.org
Wed Sep 25 03:22:45 EDT 2013
[ https://issues.jboss.org/browse/JGRP-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12807119#comment-12807119 ]
Bela Ban edited comment on JGRP-1705 at 9/25/13 3:21 AM:
---------------------------------------------------------
I see 2 scenarios where this can happen:
h5. Member sends old view before new view is received
The InfoSender task at a member kicks in just before a new view is received, and thus the member sends the old view. Example:
* The view is \{A,B,C,D\}
* A installs new view A|5 (the old view is A|4)
* Just before receiving A|5, C's InfoSender sends an INFO with view=A|4 to the coordinator (A)
* Everybody else sends an INFO with view=A|5 to A
* A now has 2 views A|4 and A|5 and starts the merge process
Actually, before this happens, the coordinator asks A (itself) and C for their actual views and only starts the merge process if the actual views differ; this as a second line of defense against a spurious merge.
h5. Coordinator's 'views' table contains outdated information
The coordinator maintains a 'views' table, which is a hashmap keyed by ViewId. The problem is that if the coord receives 2 INFO messages from the same sender, but with different ViewIds, then the merge process is triggered as well:
* C sends INFO=A|5
* The views table contains A|5=[C]
* A installs A|6 and C sends INFO=A|6
* The views table contains A|5=[C], A|6=[C] // ommitting the INFO messages from other members
Again, fetching the real views before triggering a merge should remedy this, but there's still a time window where this can happen.
h5. Update
Actually, this can also happen *when we fetch the actual views*: if a new {{view A|5}} was installed just at the same time the ViewConsistencyChecker fetches the actual views, then some members might return {{A|4}} and others {{A|5}}.
was (Author: belaban):
I see 2 scenarios where this can happen:
h5. Member sends old view before new view is received
The InfoSender task at a member kicks in just before a new view is received, and thus the member sends the old view. Example:
* The view is \{A,B,C,D\}
* A installs new view A|5 (the old view is A|4)
* Just before receiving A|5, C's InfoSender sends an INFO with view=A|4 to the coordinator (A)
* Everybody else sends an INFO with view=A|5 to A
* A now has 2 views A|4 and A|5 and starts the merge process
Actually, before this happens, the coordinator asks A (itself) and C for their actual views and only starts the merge process if the actual views differ; this as a second line of defense against a spurious merge.
h5. Coordinator's 'views' table contains outdated information
The coordinator maintains a 'views' table, which is a hashmap keyed by ViewId. The problem is that if the coord receives 2 INFO messages from the same sender, but with different ViewIds, then the merge process is triggered as well:
* C sends INFO=A|5
* The views table contains A|5=[C]
* A installs A|6 and C sends INFO=A|6
* The views table contains A|5=[C], A|6=[C] // ommitting the INFO messages from other members
Again, fetching the real views before triggering a merge should remedy this, but there's still a time window where this can happen.
Update: actually, this can also happen *when we fetch the actual views*: if a new {{view A|5}} was installed just at the same time the ViewConsistencyChecker fetches the actual views, then some members might return {{A|4}} and others {{A|5}}.
> MERGE3 handles fake merge views
> -------------------------------
>
> Key: JGRP-1705
> URL: https://issues.jboss.org/browse/JGRP-1705
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.4
> Reporter: Karim AMMOUS
> Assignee: Bela Ban
> Fix For: 3.4
>
> Attachments: JIRA-MERGE-1subGroup-Shutdown.log, JIRA-MERGE-1subGroup.log
>
>
> Under some conditions, MERGE3 handles fake merge views. After "merge", number of members that form the cluster still the same and the mergeView contains only one subgroup.
> Scenario:
> - Form a large cluster (1115 mbrs)
> - Enable MERGE3 TRACE logging level on coordintor member (172.29.190.5)
> - Reboot one machine (172.29.190.5) hosting 7 JVMs
> - Coord install a new view with 1108 mbrs.
> When new members join the cluster we observe a merge view formed by only one subgroup.
> Coordinator log file was attached to this JIRA.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list