]
Nick Sawadsky commented on JGRP-2288:
-------------------------------------
I entered JGRP-2320 to track these.
S3_PING: Under certain conditions, subclusters fail to merge after
network partition
------------------------------------------------------------------------------------
Key: JGRP-2288
URL:
https://issues.jboss.org/browse/JGRP-2288
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.15
Reporter: Nick Sawadsky
Assignee: Bela Ban
Priority: Major
Fix For: 4.0.16, 3.6.17
Repro steps:
1. Set up a cluster of four nodes, two on one machine (Host 1) and two on another (Host
2). Let's call the nodes A, B, C, and D.
2. Configure all 4 nodes with S3_PING as the discovery mechanism. Set
remove_all_files_on_view_change to true.
3. Start up nodes in the order A, B, C, D.
4. In the S3 bucket, there should be a single file with all four nodes listed. Node A
should be flagged as the coordinator. Ensure that the UUID for node B is larger than the
UUID for node C, when compared as two's complement integers. If this is not the case,
shut down all nodes and restart in order. Repeat until the desired relationship is
achieved. Note that with two's complement, a UUID having a first hex digit of 8 or
higher is treated as negative for comparison purposes. So, for example, a UUID starting
with 'a' is less than a UUID starting with 'b' which is less than a UUID
starting with '1'.
5. On Host 1, use iptables to block all traffic going to and coming from Host 2.
sudo iptables -A INPUT -s <Host 2 IP addr> -j DROP
sudo iptables -A OUTPUT -d <Host 2 IP addr> -j DROP
6. Allow a few minutes for the nodes to detect the network partition. Eventually you
should see two files in the S3 bucket.
7. Using Ctrl-C, stop node A.
8. You should soon find only a single file in the bucket, containing a single entry for
node B. This is a result of the remove_all_files_on_view_change setting on S3_PING, which
we set to true to avoid accumulation of old files in the bucket.
9. Resolve the network partition:
sudo iptables -F OUTPUT
sudo iptables -F INPUT
10. You will find that, even after many minutes, the subclusters are not merged.
I believe the reason why the subclusters are never merged is as follows:
- MERGE3 on nodes B, C and D uses S3_PING to find members to send INFO messages to. Each
one finds only node B in the discovery file. As a result, only node B's view
consistency checker has anything to work with.
- On node B, the consistency checker can see that there are two coordinators, B and C.
However, node C has a lower UUID, so node B defers to it to perform the merge. Node C
never performs the merge because, as mentioned above, it is not receiving any INFO
messages.
I this this problem would affect FILE_PING as well, and other protocols derived from
FILE_PING. Looking at the latest 4.x code, it appears the problem still exists there.
I think the crux of the issue is that the coordinator on Host 2 (node C) does not
re-create its discovery file after it is deleted by node B. Would it be reasonable for
FILE_PING.findMembers() to create the discovery file if the node is a coordinator and the
file doesn't exist?