[jboss-jira] [JBoss JIRA] Created: (JBMESSAGING-1864) Deadlock in creating consumer during failover time

Yong Hao Gao (JIRA) jira-events at lists.jboss.org
Mon May 16 04:39:00 EDT 2011


Deadlock in creating consumer during failover time
--------------------------------------------------

                 Key: JBMESSAGING-1864
                 URL: https://issues.jboss.org/browse/JBMESSAGING-1864
             Project: JBoss Messaging
          Issue Type: Bug
          Components: JMS Clustering
    Affects Versions: 1.4.8.GA, 1.4.0.SP3.CP12
            Reporter: Yong Hao Gao
            Assignee: Yong Hao Gao
             Fix For: 1.4.0.SP3.CP13, 1.4.8.SP1


When a node is performing failover for the dead node, there is a possible condition where a dead lock could happen if a new binding is created at that time (triggered by creating a new subscription on a topic). To reproduce you need manually 'instrument' the code to create a suitable timing:

1 set up a 2 node cluster node0 and node1

2 kill node0 and let the failover happen. Let it stops at MessagingPostOffice.performFailover(), after the line 

pm.mergeTransactions(failedNodeID.intValue(), thisNodeID);

So at this point the write lock is just about to be obtained.

3 create a consumer (must be a new subscriber) on node1, let the calling thread proceed to MessagingPostOffice.addBindingInMemory(), before the line:

      clusterNotifier.sendNotification(notification);

The ClusterConnectionManager is the listener to handle the notification, meaning this call will result in holding the lock on ClusterConnectionManager (its notify() mehtod is synchronized)

4 then resume Step 2, the failover thread will grab the write lock and proceed. Along the way let it stop for another time at MessagingPostOffice.removeBindingInMemory(), before the line:

      clusterNotifier.sendNotification(notification);

Now this failover thread holds the write lock of the post office and is about to get the lock on ClusterConnectionManager (because it is the listener to handle the notification in its synchronized notify() method).

5 Resume step 3, the consumer creating thread grabs the lock on ClusterConnectionManager. As it goes on, it calls MessagingPostOffice.getBindings() method, which is going to get the read lock of MessagingPostOffice. However at this moment the failover thread is holding the write lock, which prevents it from getting the read lock. It has to wait for it.

6. Now resume step 4. As it happens, in step 5 the consumer creating thread already gets the lock on ClusterConnectionManager, so it cannot proceed. In the meantime it holds the write lock and never get released, so the consumer creating thread will never get it. The two thread get dead-locked.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list