[jboss-jira] [JBoss JIRA] Resolved: (JBAS-4229) HASingletonController doesn't handle "split brain" correctly.
Brian Stansberry (JIRA)
jira-events at lists.jboss.org
Wed Mar 28 18:52:57 EDT 2007
[ http://jira.jboss.com/jira/browse/JBAS-4229?page=all ]
Brian Stansberry resolved JBAS-4229.
------------------------------------
Resolution: Done
See http://wiki.jboss.org/wiki/Wiki.jsp?page=HASingletonAndClusterMerges for docs.
There are of course unit tests of the basic functionality. I also manually tested a setup with two cluster nodes with HA-JMS talking to a common db. Had a test client continually creating sessions and sending messages, and MDBs reading the messages. Forced a cluster partition and subsequent merge. HA-JMS looked to handle it properly; i.e. the surviving master restarted without issue.
> HASingletonController doesn't handle "split brain" correctly.
> -------------------------------------------------------------
>
> Key: JBAS-4229
> URL: http://jira.jboss.com/jira/browse/JBAS-4229
> Project: JBoss Application Server
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Clustering
> Affects Versions: JBossAS-4.0.3 SP1, JBossAS-4.2.0.CR1, JBossAS-4.0.5.GA, JBossAS-4.0.4.GA
> Reporter: Adrian Brock
> Assigned To: Brian Stansberry
> Fix For: JBossAS-4.2.0.GA
>
>
> The HASingletonController doesn't understand the "split brain" problem.
> Take for example a JBossMQ destination (queue) which needs to be "restored" from the database
> when a new singleton is elected.
> The scenario goes as follows:
> STEP1 (original state):
> cluster=server1, server2
> master=server1
> STEP2 (unplug server1 from the network):
> cluster=server2
> master=server2
> server2 will now restore the queue from the database
> BUT! server1 has the view
> cluster=server1
> master=server1
> STEP3 (plug server1 back into the network)
> cluster=server1, server2
> master=server1
> We are back to the original state, but since server1 thinks it never left the cluster,
> it doesn't restore the changes (from the database) that server2 made to the queue between STEP2 and STEP3.
> This is because it doesn't restart the HASingleton on server1.
> If there was a third server (server3) to abitrate this wouldn't be a problem.
> Both server2 and server3 would agree that server2 is the master and that server1 has left and must rejoin.
> There needs to be some extra processing in the "merge" when two masters agree to form a cluster
> that ensures that whoever they elect as the new master gets its HASingleton(s) restarted.
> Otherwise, it won't have the up-to-date state from the other master.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list