[jboss-jira] [JBoss JIRA] Updated: (JBAS-4229) HASingletonController doesn't handle "split brain" correctly.

Tue May 22 01:03:52 EDT 2007

     [ http://jira.jboss.com/jira/browse/JBAS-4229?page=all ]

Brian Stansberry updated JBAS-4229:
-----------------------------------

    Description: 
The HASingletonController doesn't understand the "split brain" problem.

Take for example a JBossMQ destination (queue) which needs to be "restored" from the database
when a new singleton is elected.

The scenario goes as follows:

STEP1 (original state):
cluster=server1, server2
master=server1

STEP2 (unplug server1 from the network):
cluster=server2
master=server2
server2 will now restore the queue from the database

BUT! server1 has the view
cluster=server1
master=server1

STEP3 (plug server1 back into the network)
cluster=server1, server2
master=server1

We are back to the original state, but since server1 thinks it never left the cluster,
it doesn't restore the changes (from the database) that server2 made to the queue between STEP2 and STEP3.
This is because it doesn't restart the HASingleton on server1.

There needs to be some extra processing in the "merge" when two masters agree to form a cluster
that ensures that whoever they elect as the new master gets its HASingleton(s) restarted.
Otherwise, it won't have the up-to-date state from the other master.

  was:
The HASingletonController doesn't understand the "split brain" problem.

Take for example a JBossMQ destination (queue) which needs to be "restored" from the database
when a new singleton is elected.

The scenario goes as follows:

STEP1 (original state):
cluster=server1, server2
master=server1

STEP2 (unplug server1 from the network):
cluster=server2
master=server2
server2 will now restore the queue from the database

BUT! server1 has the view
cluster=server1
master=server1

STEP3 (plug server1 back into the network)
cluster=server1, server2
master=server1

We are back to the original state, but since server1 thinks it never left the cluster,
it doesn't restore the changes (from the database) that server2 made to the queue between STEP2 and STEP3.
This is because it doesn't restart the HASingleton on server1.

If there was a third server (server3) to abitrate this wouldn't be a problem.
Both server2 and server3 would agree that server2 is the master and that server1 has left and must rejoin.

There needs to be some extra processing in the "merge" when two masters agree to form a cluster
that ensures that whoever they elect as the new master gets its HASingleton(s) restarted.
Otherwise, it won't have the up-to-date state from the other master.

Edited description to remove discussion of how presence of a 3rd server would avoid the problem. In the case of a merge, it wouldn't.

> HASingletonController doesn't handle "split brain" correctly.
> -------------------------------------------------------------
>
>                 Key: JBAS-4229
>                 URL: http://jira.jboss.com/jira/browse/JBAS-4229
>             Project: JBoss Application Server
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Clustering
>    Affects Versions: JBossAS-4.0.3 SP1, JBossAS-4.0.5.GA, JBossAS-4.0.4.GA, JBossAS-4.2.0.CR1
>            Reporter: Adrian Brock
>         Assigned To: Brian Stansberry
>             Fix For: JBossAS-4.2.0.CR2
>
>
> The HASingletonController doesn't understand the "split brain" problem.
> Take for example a JBossMQ destination (queue) which needs to be "restored" from the database
> when a new singleton is elected.
> The scenario goes as follows:
> STEP1 (original state):
> cluster=server1, server2
> master=server1
> STEP2 (unplug server1 from the network):
> cluster=server2
> master=server2
> server2 will now restore the queue from the database
> BUT! server1 has the view
> cluster=server1
> master=server1
> STEP3 (plug server1 back into the network)
> cluster=server1, server2
> master=server1
> We are back to the original state, but since server1 thinks it never left the cluster,
> it doesn't restore the changes (from the database) that server2 made to the queue between STEP2 and STEP3.
> This is because it doesn't restart the HASingleton on server1.
> There needs to be some extra processing in the "merge" when two masters agree to form a cluster
> that ensures that whoever they elect as the new master gets its HASingleton(s) restarted.
> Otherwise, it won't have the up-to-date state from the other master.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira