[jboss-jira] [JBoss JIRA] (JBAS-9456) JBoss 6.0.0 fails to restart HA Singletons after recovering from a split brain

Robert Hayward (Created) (JIRA) jira-events at lists.jboss.org
Thu Dec 8 10:35:44 EST 2011


JBoss 6.0.0 fails to restart HA Singletons after recovering from a split brain 
-------------------------------------------------------------------------------

                 Key: JBAS-9456
                 URL: https://issues.jboss.org/browse/JBAS-9456
             Project: Legacy JBoss Application Server 6 
          Issue Type: Bug
      Security Level: Public (Everyone can see)
          Components: Clustering
    Affects Versions: 6.0.0.Final
         Environment: Any
            Reporter: Robert Hayward
            Assignee: Paul Ferraro


We've been running with JBoss 6.0.0 clustered across 2 boxes and running with a number of HA Singletons. A brief network outage caused the cluster to split and the HA Singletons to start up on the second box. After the network issues were resolved, the JBoss instances correctly re-clustered, but the HA Singletons remained running on both boxes.
I believe that they should have automatically stopped and only the HA Singletons on the master node should have started back up.

I've finally tracked the issue down to common/lib/jboss-ha-server-core.jar from the source code at
http://grepcode.com/snapshot/repository.jboss.org/nexus/content/repositories/releases/org.jboss.cluster/jboss-ha-server-core/1.0.0.Final

The bug is in the file:
org/jboss/ha/core/framework/server/DistributedReplicantManagerImpl.java

In the method:
   /**
    * Add a replicant to the replicants map.
    * @param key replicant key name
    * @param nodeName name of the node that adds this replicant
    * @param replicant Serialized representation of the replica
    * @return true, if this replicant was newly added to the map, false otherwise
    */
   protected boolean addReplicant(String key, String nodeName, Serializable replicant)
   {
      ConcurrentMap<String, Serializable> map = new ConcurrentHashMap<String, Serializable>();
      
      ConcurrentMap<String, Serializable> existingMap = this.replicants.putIfAbsent(key, map);
      
      return (((existingMap != null) ? existingMap : map).put(nodeName, replicant) != null);
   }

The last line of the method should be changed to:
      return (((existingMap != null) ? existingMap : map).put(nodeName, replicant) == null);

addReplicant() should return true if the replicant wasn't previously in the map, which would happen if the Map.put() method returns null. It looks like the return value of this method is only checked when merging a split cluster.

Probably affects JBoss 6.1.0 - not sure about 7.X.X though.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list