[jboss-jira] [JBoss JIRA] (JGRP-1484) SEQUENCER and merge-views broken
Bela Ban (JIRA)
jira-events at lists.jboss.org
Mon Sep 3 12:22:32 EDT 2012
[ https://issues.jboss.org/browse/JGRP-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715457#comment-12715457 ]
Bela Ban edited comment on JGRP-1484 at 9/3/12 12:20 PM:
---------------------------------------------------------
Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and messages is the same everywhere.
I've thought about how to keep this property in the case of a merge, but so far haven't come up with a good solution.
In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW, it installs the MergeView directly. However, this destroys the property above, as the order of the view installation with respect to messages is not the same across all members.
I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator to install a new MergeView. However, the consequence of this is the same as with your solution: we'd lose the ordering between MergeViews and messages.
The reason why we can't simply multicast the MergeView through SEQUENCER is that the coordinator (as we still see it) might be in a different partition, and it might discard forwarded messages as it is not the new coordinator in that partition.
E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by all members.
However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}
If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a multicast to disseminate the new MergeView, and A (the merge participant) would forward it to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded message as it is not the coordinator. Even if the new coord was C, and A forwarded the VIEW to C, then C's broadcast of the VIEW would be discarded by A as C is not in A's (old) view.
was (Author: belaban):
Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and messages is the same everywhere.
I've thought about how to keep this property in the case of a merge, but so far haven't come up with a good solution.
In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW, it installs the MergeView directly. However, this destroys the property above, as the order of the view installation with respect to messages is not the same across all members.
I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator to install a new MergeView. However, the consequence of this is the same as with your solution: we'd lose the ordering between MergeViews and messages.
The reason why we can't simply multicast the MergeView through SEQUENCER is that the coordinator (as we still see it) might be in a different partition, and it might discard forwarded messages as it is not the new coordinator in that partition.
E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by all members.
However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}
If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a multicast to disseminate the new MergeView, and A (the merge participant) would forward it to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded message as it is not the coordinator.
> SEQUENCER and merge-views broken
> --------------------------------
>
> Key: JGRP-1484
> URL: https://issues.jboss.org/browse/JGRP-1484
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.0.10
> Reporter: David Hotham
> Assignee: Bela Ban
> Fix For: 3.2
>
>
> Here's a new way in which putting SEQUENCER below GMS is broken.
> Start with A having view B|4 [B,A], while B, C and D all have view B|7 [B,C,D].
> Now we start a merge, in which B is coordinator. B creates the view C|8 [C, D, A, B].
> (I've opened a pull request saying that B surely shouldn't issue a view where the ViewID says that C was the creator. But I think that this is incidental, and not key to the bug that I'm reporting here).
> Now B sends INSTALL_MERGE_VIEW to B (a coordinator) and A (a merge participant, per Util.determineMergeParticipants).
> B gets this first and broadcasts the new view to [B, C, D]. In particular, B is now not a coordinator.
> Then A gets the INSTALL_MERGE_VIEW, and it too tries broadcasting the new view. SEQUENCER gets involved, and forwards the broadcast to B (as the coordinator in the old view). B discards this; it's no longer a coordinator.
> So the new view is not installed at A. All future broadcasts from A are forwarded to B, who discards them. The group is fractured, and none of A's broadcasts are delivered.
> I'm not sure what the right fix would be. I wonder whether things should be arranged so that in a merge:
> - coordinators behave as today, broadcasting the new view to their own sub-groups
> - but mere participants do not do this: they should just have the new view installed on them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list