[
https://issues.jboss.org/browse/JGRP-1484?page=com.atlassian.jira.plugin....
]
Bela Ban edited comment on JGRP-1484 at 9/3/12 12:20 PM:
---------------------------------------------------------
Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and
messages is the same everywhere.
I've thought about how to keep this property in the case of a merge, but so far
haven't come up with a good solution.
In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW,
it installs the MergeView directly. However, this destroys the property above, as the
order of the view installation with respect to messages is not the same across all
members.
I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing
SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator
to install a new MergeView. However, the consequence of this is the same as with your
solution: we'd lose the ordering between MergeViews and messages.
The reason why we can't simply multicast the MergeView through SEQUENCER is that the
coordinator (as we still see it) might be in a different partition, and it might discard
forwarded messages as it is not the new coordinator in that partition.
E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would
multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by
all members.
However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}
If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a
multicast to disseminate the new MergeView, and A (the merge participant) would forward it
to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded
message as it is not the coordinator. Even if the new coord was C, and A forwarded the
VIEW to C, then C's broadcast of the VIEW would be discarded by A as C is not in
A's (old) view.
was (Author: belaban):
Hmm. I assume the reason you have SEQUENCER *below* GMS is that the order of views and
messages is the same everywhere.
I've thought about how to keep this property in the case of a merge, but so far
haven't come up with a good solution.
In your suggested solution, when a merge participant gets a (unicast) INSTALL_MERGE_VIEW,
it installs the MergeView directly. However, this destroys the property above, as the
order of the view installation with respect to messages is not the same across all
members.
I experimented with marking the MergeView multicast as NO_TOTAL_ORDER, ie. bypassing
SEQUENCER for the MergeView installation: this would allow a merge participant/coordinator
to install a new MergeView. However, the consequence of this is the same as with your
solution: we'd lose the ordering between MergeViews and messages.
The reason why we can't simply multicast the MergeView through SEQUENCER is that the
coordinator (as we still see it) might be in a different partition, and it might discard
forwarded messages as it is not the new coordinator in that partition.
E.g. {A,B} and {C,D} where B and C and merge coordinators would work, as A and C would
multicast the MergeView locally, and then the new coordinator (e.g. C) would be visible by
all members.
However, the case below wouldn't work:
A: {D,A}
B,C,D: {C,B,D}
If the coordinator in A's partition is D, then the INSTALL_MERGE_VIEW would trigger a
multicast to disseminate the new MergeView, and A (the merge participant) would forward it
to D. Say the new coordinator of the MergeView is C, then D would discard the forwarded
message as it is not the coordinator.
SEQUENCER and merge-views broken
--------------------------------
Key: JGRP-1484
URL:
https://issues.jboss.org/browse/JGRP-1484
Project: JGroups
Issue Type: Bug
Affects Versions: 3.0.10
Reporter: David Hotham
Assignee: Bela Ban
Fix For: 3.2
Here's a new way in which putting SEQUENCER below GMS is broken.
Start with A having view B|4 [B,A], while B, C and D all have view B|7 [B,C,D].
Now we start a merge, in which B is coordinator. B creates the view C|8 [C, D, A, B].
(I've opened a pull request saying that B surely shouldn't issue a view where the
ViewID says that C was the creator. But I think that this is incidental, and not key to
the bug that I'm reporting here).
Now B sends INSTALL_MERGE_VIEW to B (a coordinator) and A (a merge participant, per
Util.determineMergeParticipants).
B gets this first and broadcasts the new view to [B, C, D]. In particular, B is now not
a coordinator.
Then A gets the INSTALL_MERGE_VIEW, and it too tries broadcasting the new view.
SEQUENCER gets involved, and forwards the broadcast to B (as the coordinator in the old
view). B discards this; it's no longer a coordinator.
So the new view is not installed at A. All future broadcasts from A are forwarded to B,
who discards them. The group is fractured, and none of A's broadcasts are delivered.
I'm not sure what the right fix would be. I wonder whether things should be arranged
so that in a merge:
- coordinators behave as today, broadcasting the new view to their own sub-groups
- but mere participants do not do this: they should just have the new view installed on
them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira