this is related to
https://jira.jboss.org/jira/browse/JBMESSAGING-1194
Starting with the simplest case, it appears that we can very easily have a split-brain
between a live node and its backup node.
1. normal use case
C1 & C2 are connected to live node L1
L1 is replicated on the backup node B1
| C1 - - - B1
| \ |
| \ |
| \ |
| \ |
| L1*
|
| *: live node
|
2. Backup activation is triggered by the 1st client connection to the backup node
Network cable is unplugged from L1
1. C1 will failover to B1
2. B1 will be activated and becomes live
3. L1 is still alive
4. L1 will be informed that the connection to B1 is dead
-> it will stop replicate to B1 but it remains alive
=> 2 live nodes: split-brain!
| C1 - - - B1*
| X X
| X X
| X X
| X X
| L1*
|
3. in case the network connection failure was transient (someone replugs the cable in C1)
1. another client C2 joins and connects to L1 as its live node
2. C1 will continue to use BA as its live node
| C1 - - - B1*
| /
| /
| /
| /
| C2 - - - L1*
|
To sum up, we can reach a split-brain after a transient network failure on the live node.
In the code, I've not seen any heartbeat between the live node and the backup node. L1
will never be informed when it can reach B1 again.
B1 as the original backup never checks connectivity to the original live node L1
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4207359#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...