[jboss-jira] [JBoss JIRA] (JGRP-2470) JBDC_PING can face a split-brain issue when restarting a coordinator node
Masafumi Miura (Jira)
issues at jboss.org
Wed Aug 12 02:50:00 EDT 2020
[ https://issues.redhat.com/browse/JGRP-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384134#comment-14384134 ]
Masafumi Miura commented on JGRP-2470:
--------------------------------------
I saw [the fix|https://github.com/belaban/JGroups/commit/9b5aeb5ea28d99fb104dafe2e98e6ee9146b5eb5#diff-6b7306e1d2f7cbc9fddffed0e74b452cR134] that set {{is_coord}} to false in {{Discovery#stop()}}. Hmm, then, when will {{removeAll(cluster_name)}} in {{JDBC_PING#stop()}} be invoked?
I think it will be never invoked. If so, isn't such an unused code better to be removed to avoid confusion?
{code:title=jgroups/src/org/jgroups/protocols/JDBC_PING.java }
115 @Override
116 public void stop() {
117 super.stop();
118 if(is_coord) // always false here because it's set to false in super.stop()
119 removeAll(cluster_name);
120 }
{code}
> JBDC_PING can face a split-brain issue when restarting a coordinator node
> -------------------------------------------------------------------------
>
> Key: JGRP-2470
> URL: https://issues.redhat.com/browse/JGRP-2470
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.1.9, 4.0.22
> Reporter: Masafumi Miura
> Assignee: Radoslav Husar
> Priority: Major
> Fix For: 4.2.5, 5.0.1
>
>
> After [the change|https://github.com/belaban/JGroups/commit/215cdb6] for JGRP-2199, JDBC_PING deletes all entries from the table during the shutdown of the coordinator node.
> This behavior has a possibility to cause a split-brain when restarting a coordinator node. Because, as all entries are lost in the following scenario, the restarting node can not find any information about existing nodes from the table and does not form a cluster.
> 0. node1 and node2 form a cluster. The node1 is a coordinator.
> 1. Trigger a restart of the node1
> 2. The node1 removes their node information from the table
> 3. The node2 becomes a new coordinator
> 4. The node2 updates their node information in the table
> 5. The node1 clears all entries from the table
> 6. The node1 starts again
> 7. The node1 does not join the existing cluster because there's no node information in the table
> Note: If step 5 happens before step 4, the split-brain issue does not happen. However, as step 4 and step 5 happen on different nodes, these steps can happen in parallel. So, the order is undefined. So, for example, if the shutdown of node1 takes a long time, there's a high possibility to face this issue.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the jboss-jira
mailing list