[jboss-jira] [JBoss JIRA] (JGRP-1484) SEQUENCER and merge-views broken

Bela Ban (JIRA) jira-events at lists.jboss.org
Mon Sep 3 12:22:32 EDT 2012


    [ https://issues.jboss.org/browse/JGRP-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715457#comment-12715457 ] 

Bela Ban edited comment on JGRP-1484 at 9/3/12 12:20 PM:
---------------------------------------------------------

Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and messages is the same everywhere.

I've thought about how to keep this property in the case of a merge, but so far haven't come up with a good solution.

In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW, it installs the MergeView directly. However, this destroys the property above, as the order of the view installation with respect to messages is not the same across all members.

I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator to install a new MergeView. However, the consequence of this is the same as with your solution: we'd lose the ordering between MergeViews and messages.

The reason why we can't simply multicast the MergeView through SEQUENCER is that the coordinator (as we still see it) might be in a different partition, and it might discard forwarded messages as it is not the new coordinator in that partition.

E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by all members.

However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}

If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a multicast to disseminate the new MergeView, and A (the merge participant) would forward it to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded message as it is not the coordinator. Even if the new coord was C, and A forwarded the VIEW to C, then C's broadcast of the VIEW would be discarded by A as C is not in A's (old) view.
                
      was (Author: belaban):
    Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and messages is the same everywhere.

I've thought about how to keep this property in the case of a merge, but so far haven't come up with a good solution.

In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW, it installs the MergeView directly. However, this destroys the property above, as the order of the view installation with respect to messages is not the same across all members.

I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator to install a new MergeView. However, the consequence of this is the same as with your solution: we'd lose the ordering between MergeViews and messages.

The reason why we can't simply multicast the MergeView through SEQUENCER is that the coordinator (as we still see it) might be in a different partition, and it might discard forwarded messages as it is not the new coordinator in that partition.

E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by all members.

However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}

If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a multicast to disseminate the new MergeView, and A (the merge participant) would forward it to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded message as it is not the coordinator.
                  
> SEQUENCER and merge-views broken
> --------------------------------
>
>                 Key: JGRP-1484
>                 URL: https://issues.jboss.org/browse/JGRP-1484
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.0.10
>            Reporter: David Hotham
>            Assignee: Bela Ban
>             Fix For: 3.2
>
>
> Here's a new way in which putting SEQUENCER below GMS is broken.
> Start with A having view B|4 [B,A], while B, C and D all have view B|7 [B,C,D].
> Now we start a merge, in which B is coordinator.  B creates the view C|8 [C, D, A, B].
> (I've opened a pull request saying that B surely shouldn't issue a view where the ViewID says that C was the creator.  But I think that this is incidental, and not key to the bug that I'm reporting here).
> Now B sends INSTALL_MERGE_VIEW to B (a coordinator) and A (a merge participant, per Util.determineMergeParticipants).
> B gets this first and broadcasts the new view to [B, C, D].  In particular, B is now not a coordinator.
> Then A gets the INSTALL_MERGE_VIEW, and it too tries broadcasting the new view.  SEQUENCER gets involved, and forwards the broadcast to B (as the coordinator in the old view).  B discards this; it's no longer a coordinator.
> So the new view is not installed at A.  All future broadcasts from A are forwarded to B, who discards them.  The group is fractured, and none of A's broadcasts are delivered.
> I'm not sure what the right fix would be.  I wonder whether things should be arranged so that in a merge:
> -  coordinators behave as today, broadcasting the new view to their own sub-groups
> -  but mere participants do not do this: they should just have the new view installed on them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list