[jboss-jira] [JBoss JIRA] (JGRP-1705) MERGE3 handles fake merge views

Bela Ban (JIRA) jira-events at lists.jboss.org
Wed Sep 25 03:22:45 EDT 2013


    [ https://issues.jboss.org/browse/JGRP-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12807119#comment-12807119 ] 

Bela Ban edited comment on JGRP-1705 at 9/25/13 3:21 AM:
---------------------------------------------------------

I see 2 scenarios where this can happen:

h5. Member sends old view before new view is received
The InfoSender task at a member kicks in just before a new view is received, and thus the member sends the old view. Example:
* The view is \{A,B,C,D\}
* A installs new view A|5 (the old view is A|4)
* Just before receiving A|5, C's InfoSender sends an INFO with view=A|4 to the coordinator (A)
* Everybody else sends an INFO with view=A|5 to A
* A now has 2 views A|4 and A|5 and starts the merge process

Actually, before this happens, the coordinator asks A (itself) and C for their actual views and only starts the merge process if the actual views differ; this as a second line of defense against a spurious merge.


h5. Coordinator's 'views' table contains outdated information
The coordinator maintains a 'views' table, which is a hashmap keyed by ViewId. The problem is that if the coord receives 2 INFO messages from the same sender, but with different ViewIds, then the merge process is triggered as well:
* C sends INFO=A|5
* The views table contains A|5=[C]
* A installs A|6 and C sends INFO=A|6
* The views table contains A|5=[C], A|6=[C] // ommitting the INFO messages from other members

Again, fetching the real views before triggering a merge should remedy this, but there's still a time window where this can happen.

Update: actually, this can also happen *when we fetch the actual views*: if a new {{view A|5}} was installed just at the same time the ViewConsistencyChecker fetches the actual views, then some members might return {{A|4}} and others {{A|5}}.
                
      was (Author: belaban):
    I see 2 scenarios where this can happen:

h5. Member sends old view before new view is received
The InfoSender task at a member kicks in just before a new view is received, and thus the member sends the old view. Example:
* The view is \{A,B,C,D\}
* A installs new view A|5 (the old view is A|4)
* Just before receiving A|5, C's InfoSender sends an INFO with view=A|4 to the coordinator (A)
* Everybody else sends an INFO with view=A|5 to A
* A now has 2 views A|4 and A|5 and starts the merge process

Actually, before this happens, the coordinator asks A (itself) and C for their actual views and only starts the merge process if the actual views differ; this as a second line of defense against a spurious merge.


h5. Coordinator's 'views' table contains outdated information
The coordinator maintains a 'views' table, which is a hashmap keyed by ViewId. The problem is that if the coord receives 2 INFO messages from the same sender, but with different ViewIds, then the merge process is triggered as well:
* C sends INFO=A|5
* The views table contains A|5=[C]
* A installs A|6 and C sends INFO=A|6
* The views table contains A|5=[C], A|6=[C] // ommitting the INFO messages from other members

Again, fetching the real views before triggering a merge should remedy this, but there's still a time window where this can happen
                  
> MERGE3 handles fake merge views
> -------------------------------
>
>                 Key: JGRP-1705
>                 URL: https://issues.jboss.org/browse/JGRP-1705
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4
>            Reporter: Karim AMMOUS
>            Assignee: Bela Ban
>             Fix For: 3.4
>
>         Attachments: JIRA-MERGE-1subGroup-Shutdown.log, JIRA-MERGE-1subGroup.log
>
>
> Under some conditions, MERGE3 handles fake merge views. After "merge", number of members that form the cluster still the same and the mergeView contains only one subgroup.
> Scenario:
> - Form a large cluster (1115 mbrs)
> - Enable MERGE3 TRACE logging level on coordintor member (172.29.190.5)
> - Reboot one machine (172.29.190.5) hosting 7 JVMs
> - Coord install a new view with 1108 mbrs.
> When new members join the cluster we observe a merge view formed by only one subgroup.
> Coordinator log file was attached to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list