[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1760) Racing Condition in JBM cluster startup

Howard Gao (JIRA) jira-events at lists.jboss.org
Mon Nov 16 06:21:29 EST 2009


    [ https://jira.jboss.org/jira/browse/JBMESSAGING-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12494804#action_12494804 ] 

Howard Gao commented on JBMESSAGING-1760:
-----------------------------------------

To avoid this problem, I'm trying to synchronize the two queue activation operations, as in MessagingPostOffice.addBindingsFromCluster() and TopicService.startService().


> Racing Condition in JBM cluster startup
> ---------------------------------------
>
>                 Key: JBMESSAGING-1760
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1760
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering
>    Affects Versions: 1.4.0.SP3.CP09, 1.4.5.GA, 1.4.6.GA
>            Reporter: Howard Gao
>            Assignee: Howard Gao
>             Fix For: 1.4.0.SP3.CP10, 1.4.6.GA.SP1, 1.4.7.GA
>
>         Attachments: MessagingPostOffice.java, StatusObject.java, TopicService.java
>
>
> To reproduce:
> a. set up a 2 node (node0 and node1) cluster, deploy a distributed topic service. 
> b. set FailoverOnNodeLeave to false. Start up the cluster and create a durable subscriber to the topic. Then shutdown both nodes. That will leave two bindings in the DB.
> c. now set FailoverOnNodeLeave to true and start up the cluster again. The following steps need extra manipulation to achieve.
> d. shutdown node1. node0 will detect node leave and perform failover. During failover it will delete the binding (belonging to node1) from the DB, leaving only one binding.
> e. shutdown node0 once the failover completes. 
> f. start node 0 and node 1 in exactly the following order:
> 1 let node 1's post office start up but wait there, don't let the topic service loaded and started.
> 2 let node 0's post office start up. It will load its binding to the post office and it will multicast the binding request to node1.
> 3 on receiving the multicast, node 1's post office will create and add a local binding and then activate the queue. But for the purpose of reproducing this issue, we need let node1 wait just before activating the queue. 
> 4 Then let node1's post office go on to load the topic service. The topic service (in its startService method) will get the bindings from post office, call setPagingParams() on each queue in the bindings. Before each call, it will check if the queue is active already, if yes, it doesn't call the method. So far because we didn't activate the queue in the last step, so the check result will be negative. Now again we purposely stop right before calling setPagingParams().
> 5 Now resume step 3, the queue is activated.
> 6 right after the queue is activated, resume step 4, the setPagingParams() will be called and because the queue is already active, you will see IllegalStateException in the end.
> Note: The steps above work for current release (cp09).  In JBM 1.4.0.SP3-CP04, the isActive() check hasn't been added to TopicService.startService() method, so it would be much more easier to reproduce this error, because you don't need to do the manipulation to get around the check.
> Also in real situation the FailoverOnNodeLeave is not necessarily to be set to false once. It can always be true.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list