moving forward...
The test SplitBrainTest.testDemonstrateSplitBrain shows how to reach a split-brain tests
where the same messages are consumed by 2 different consumers.
To prevent this split-brain to occur where live node remains active once the backup node
has been activated, the strategy would be:
when the live node lose its replicating connection
- this can be because the backup node has been activated or crashed or the network is
cut b/w the live and backup node
- to check if the live node is isolated or not, it sends a messages to other nodes
- if it reaches the quorum, it stays alive
- else, it has been cut from both the backup and the other cluster nodes, it kills
itself => the backup is the only active node
However, this won't solve the split-brain which may occur when the network is cut
between the live & backup nodes but the live node remains connected to other cluster
nodes.
In that case, the live node will reach the quorum and remain active while the backup node
has also been activated.
What is the required quorum?
The simplest solution is to have a majority of members; the members being:
- the live node
- the backup node
- the other live nodes of the cluster
Given the special relation between the backup and the live node, the live node should pay
special attention to a response from the backup:
- if the backup does not reply => the network is still cut between the live and backup
node or the backup node is crashed
- if it replied => the network failure was transient. In that case, the backup
response should include a "active" boolean
- if the backup node is active, the live node should kill itself
- else, the live node can continue to live (and perhaps it can also reopen its
replicating connection to the backup)
If there are no other nodes in the cluster, we can't apply this strategy.
Another thing worth mentioning: the live & backup should be on the same LAN while the
other cluster nodes may be on a WAN.
To sum up, I need to think about it more...
View the original post :
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4209575#...
Reply to the post :
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&a...