[jboss-jira] [JBoss JIRA] (JGRP-2479) [GSS](7.3.x) JBDC_PING has a possibility not to discover other members and create singleton clusters (split-brain issue) when restarting a coordinator node
Masafumi Miura (Jira)
issues at jboss.org
Wed May 27 21:01:17 EDT 2020
Masafumi Miura created JGRP-2479:
------------------------------------
Summary: [GSS](7.3.x) JBDC_PING has a possibility not to discover other members and create singleton clusters (split-brain issue) when restarting a coordinator node
Key: JGRP-2479
URL: https://issues.redhat.com/browse/JGRP-2479
Project: JGroups
Issue Type: Bug
Affects Versions: 4.1.9, 4.0.22
Reporter: Masafumi Miura
Assignee: Radoslav Husar
Fix For: 4.2.5
After [the change|https://github.com/belaban/JGroups/commit/215cdb6] for JGRP-2199, JDBC_PING deletes all entries from the table during the shutdown of the coordinator node.
This behavior has a possibility to cause a split-brain when restarting a coordinator node. Because, as all entries are lost in the following scenario, the restarting node can not find any information about existing nodes from the table and does not form a cluster.
0. node1 and node2 form a cluster. The node1 is a coordinator.
1. Trigger a restart of the node1
2. The node1 removes their node information from the table
3. The node2 becomes a new coordinator
4. The node2 updates their node information in the table
5. The node1 clears all entries from the table
6. The node1 starts again
7. The node1 does not join the existing cluster because there's no node information in the table
Note: If step 5 happens before step 4, the split-brain issue does not happen. However, as step 4 and step 5 happen on different nodes, these steps can happen in parallel. So, the order is undefined. So, for example, if the shutdown of node1 takes a long time, there's a high possibility to face this issue.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the jboss-jira
mailing list