]
Bela Ban updated JGRP-937:
--------------------------
Attachment: udp-2.6.xml
To reproduce:
- Start 20 nodes on different machines
- Use udp-2.6.xml (attached). This contains DELAY with a 500ms delay for sending of
messages
- Pull some plugs, kill the switch etc
- The cluster will never merge again
(This actually does work with FD)
MERGE4: get rid of shunning and use only merging (getting rid of
shunfests)
---------------------------------------------------------------------------
Key: JGRP-937
URL:
https://jira.jboss.org/jira/browse/JGRP-937
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.6.9
Attachments: udp-2.6.xml
If we have FD_ALL plus shunning, the following scenario can happen:
- A network partition with subgroups {A} and {B,C,D,E,F,G}
- The partition heals
- A gets heartbeats from all members of the 2nd subgroup
- A's FD_ALL.shun will shun all members of the 2nd subgroup !
- And vice versa, this leads to a shunfest and large clusters might never merge back
again
SOLUTION:
- Get rid of shunning (GMS.shun is false by default anyway, now also set FD/FD_ALL.shun
to false)
- MERGE4 periodically compares discovery results to its view
(- This might be done a few times)
- Then MERGE4 initiates a merge between all members who have differing views
- Make sure digests get merged correctly (min/max)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: