[jboss-jira] [JBoss JIRA] Resolved: (JBAS-4229) HASingletonController doesn't handle "split brain" correctly.

Wed Mar 28 18:52:57 EDT 2007

     [ http://jira.jboss.com/jira/browse/JBAS-4229?page=all ]

Brian Stansberry resolved JBAS-4229.
------------------------------------

    Resolution: Done

See http://wiki.jboss.org/wiki/Wiki.jsp?page=HASingletonAndClusterMerges for docs.

There are of course unit tests of the basic functionality.  I also manually tested a setup with two cluster nodes with HA-JMS talking to a common db. Had a test client continually creating sessions and sending messages, and MDBs reading the messages.  Forced a cluster partition and subsequent merge.  HA-JMS looked to handle it properly; i.e. the surviving master restarted without issue.

> HASingletonController doesn't handle "split brain" correctly.
> -------------------------------------------------------------
>
>                 Key: JBAS-4229
>                 URL: http://jira.jboss.com/jira/browse/JBAS-4229
>             Project: JBoss Application Server
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Clustering
>    Affects Versions: JBossAS-4.0.3 SP1, JBossAS-4.2.0.CR1, JBossAS-4.0.5.GA, JBossAS-4.0.4.GA
>            Reporter: Adrian Brock
>         Assigned To: Brian Stansberry
>             Fix For: JBossAS-4.2.0.GA
>
>
> The HASingletonController doesn't understand the "split brain" problem.
> Take for example a JBossMQ destination (queue) which needs to be "restored" from the database
> when a new singleton is elected.
> The scenario goes as follows:
> STEP1 (original state):
> cluster=server1, server2
> master=server1
> STEP2 (unplug server1 from the network):
> cluster=server2
> master=server2
> server2 will now restore the queue from the database
> BUT! server1 has the view
> cluster=server1
> master=server1
> STEP3 (plug server1 back into the network)
> cluster=server1, server2
> master=server1
> We are back to the original state, but since server1 thinks it never left the cluster,
> it doesn't restore the changes (from the database) that server2 made to the queue between STEP2 and STEP3.
> This is because it doesn't restart the HASingleton on server1.
> If there was a third server (server3) to abitrate this wouldn't be a problem.
> Both server2 and server3 would agree that server2 is the master and that server1 has left and must rejoin.
> There needs to be some extra processing in the "merge" when two masters agree to form a cluster
> that ensures that whoever they elect as the new master gets its HASingleton(s) restarted.
> Otherwise, it won't have the up-to-date state from the other master.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira