[jboss-jira] [JBoss JIRA] (JGRP-2276) MERGE3: a dead member as merge leader will never trigger a merge
Bela Ban (JIRA)
issues at jboss.org
Tue Jun 12 08:41:00 EDT 2018
[ https://issues.jboss.org/browse/JGRP-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589852#comment-13589852 ]
Bela Ban commented on JGRP-2276:
--------------------------------
The root cause is actually not a merge issue, but the way we install a new view when the coordinator leaves: the *coordinator* creates the new view V (excluding itself), broadcasts it and waits for acks, or until a timeout ({{view_ack_collection_timeout}}) occurs.
If a member (say member 7) drops the message, then STABLE has no chance of retransmitting V, as all members (including 7) will remove the coordinator (member 2) from their retransmission tables, and V is therefore not retransmitted.
{{ASYM_ENCRYPT}} just happens to drop the message, as it doesn't have the new secret key yet, so it's causing the above problem through the message drop. We should therefore neither fix {{ASYM_ENCRYPT}} nor {{MERGE3}}, but the way a coordinator leaves gracefully.
On second thought, we should probably also add code to {{MERGE3}} to handle the above scenario, as a second line of defense...
> MERGE3: a dead member as merge leader will never trigger a merge
> ----------------------------------------------------------------
>
> Key: JGRP-2276
> URL: https://issues.jboss.org/browse/JGRP-2276
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 4.0.12
>
>
> When we have member(s) which have a view with a dead member as coordinator, when the dead member becomes part of the subgroup coordinators and _happens to be chosen as merge leader_, then a merge will never ensue.
> Example:
> * Member 2 (with view 2|5) was the previous coordinator and left the cluster, installing view 3|6 before stopping
> ** View 2|5=\{2,3,4,5,6,7\}; view 3|6=\{3,4,5,6,7\}
> * Member 7 didn't get view 3|6 and still has view 2|5
> * Everybody else has view 3|6
> * MERGE3 gets the following views:
> ** 2|5: 7 // member 7 has this view
> ** 3|6: 3,4,5,6 // members 3,4,5 and 6 have this view
> * 2 and 3 are added to a _sorted set_ and the first member of the set (2) is chose as merge leader. 3 doesn't take any action, as it notices it won't be the merge leader
> ** The reason 2 was first in the sorted set is that (possibly by coincidence) its UUID is *lower* than that of 3. If this wasn't the case, 3 would be merge leader and start (and successfully complete) a merge. However, with dead member 2 being picked as merge leader, a merge will never be triggered!
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list