[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1796) Dead locak while simultaneously shutdown of nodes in a cluster

Wed Mar 24 02:50:38 EDT 2010

    [ https://jira.jboss.org/jira/browse/JBMESSAGING-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12521608#action_12521608 ] 

Howard Gao commented on JBMESSAGING-1796:
-----------------------------------------

Bela Ban wrote:
> > I agree, the best thing to do here is to run calculateFailoverMap() on 
> > a separate thread. It's a bad idea to (1) run long actionas or (2) 
> > send messages inside of a viewAccepted() callback, as this blocks 
> > JGroups.
> >
> > Brian Stansberry wrote:
>> >> So, input from Howard and Bela is needed. (I reattached the stack 
>> >> traces.)  The deadlock is:
>> >>
>> >> ShutdownHook: JBM locks MessagingPostOffice, later JGroups wants to 
>> >> lock GMS.members
>> >>
>> >> JGroups Incoming Thread: JGroups locks GMS.members, later JBM wants 
>> >> to lock MessagingPostOffice
>> >>
>> >> In neither place where the lock is taken does the code taking the 
>> >> lock have any idea that later the other lock is wanted; i.e. it's not 
>> >> a simple coding error.
>> >>
>> >> Howard, a general practice that I always encourage in JGroups apps is 
>> >> to have another thread available to asynchronously process events 
>> >> triggered by a JGroups MembershipListener.viewAccepted() callback. In 
>> >> this case that would be 
>> >> org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted. 
>> >> Let the JGroups thread cache the membership information in some 
>> >> object and then signal another thread to handle it. Let the JGroups 
>> >> thread promptly return. A simple way to do that is encapsulate the 
>> >> current viewAccepted logic in a anonymous Runnable and pass the 
>> >> Runnable to an java.util.concurrent.ExecutorService to execute.
>> >>
>> >> If that's not doable, then on either the JGroups side or the JBM side 
>> >> is going to need to reduce the scope of the locks. I doubt that's a 
>> >> simple thing though for either side.
>> >>
>> >>
>> >>
>> >> On 02/26/2010 09:59 AM, Colin Mondesir wrote:
>>> >>> Hi Brian,
>>> >>>
>>> >>> See attached.
>>> >>>
>>> >>>
>>> >>> ----- Original Message -----
>>> >>> From: "Brian Stansberry"<brian.stansberry at redhat.com>
>>> >>> To: "Colin Mondesir"<cmondesi at redhat.com>
>>> >>> Cc: "jboss-support-clustering"<jboss-support-clustering at redhat.com>, 
>>> >>> "jboss-support-messaging"<jboss-support-messaging at redhat.com>
>>> >>> Sent: Friday, February 26, 2010 2:27:12 PM GMT +00:00 GMT Britain, 
>>> >>> Ireland, Portugal
>>> >>> Subject: Re: Deadlock on server shutdown (case #552253)
>>> >>>
>>> >>> No, it's not a known issue, at least not to me.  What's the stack trace
>>> >>> for the other thread, Incoming-5,10.33.0.23:50125 ?
>>> >>>
>>> >>> On 02/26/2010 04:34 AM, Colin Mondesir wrote:
>>>> >>>> https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=552253&gid=1354 
>>>> >>>>
>>>> >>>>
>>>> >>>> A customer is getting a deadlock when simultaneously shutting down 
>>>> >>>> two clustered servers (EAP 5.0), one instance will shutdown 
>>>> >>>> correctly but the other does not.
>>>> >>>>
>>>> >>>> Extract from a tread dump of the failed server:
>>>> >>>>
>>>> >>>> Found one Java-level deadlock:
>>>> >>>> =============================
>>>> >>>> "JBoss Shutdown Hook":
>>>> >>>>     waiting to lock monitor 0x1182ebac (object 0x96b64c80, a 
>>>> >>>> org.jgroups.Membership),
>>>> >>>>     which is held by "Incoming-5,10.33.0.23:50125"
>>>> >>>> "Incoming-5,10.33.0.23:50125":
>>>> >>>>     waiting to lock monitor 0x873b0fb0 (object 0x96abd160, a 
>>>> >>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice),
>>>> >>>>     which is held by "JBoss Shutdown Hook"
>>>> >>>>
>>>> >>>> Java stack information for the threads listed above:
>>>> >>>> ===================================================
>>>> >>>> "JBoss Shutdown Hook":
>>>> >>>>           at 
>>>> >>>> org.jgroups.protocols.pbcast.GMS.determineCoordinator(GMS.java:564)
>>>> >>>>           - waiting to lock<0x96b64c80>   (a org.jgroups.Membership)
>>>> >>>>           at 
>>>> >>>> org.jgroups.protocols.pbcast.ParticipantGmsImpl.leave(ParticipantGmsImpl.java:57) 
>>>> >>>>
>>>> >>>>           at org.jgroups.protocols.pbcast.GMS.down(GMS.java:886)
>>>> >>>>           at org.jgroups.protocols.FC.down(FC.java:434)
>>>> >>>>           at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
>>>> >>>>           at 
>>>> >>>> org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:209) 
>>>> >>>>
>>>> >>>>           at org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:291)
>>>> >>>>           at 
>>>> >>>> org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:461)
>>>> >>>>           at org.jgroups.JChannel.down(JChannel.java:1514)
>>>> >>>>           at org.jgroups.JChannel.disconnect(JChannel.java:500)
>>>> >>>>           - locked<0x96ae96f0>   (a org.jgroups.JChannel)
>>>> >>>>           at org.jgroups.JChannel._close(JChannel.java:1730)
>>>> >>>>           at org.jgroups.JChannel.close(JChannel.java:516)
>>>> >>>>           - locked<0x96ae96f0>   (a org.jgroups.JChannel)
>>>> >>>>           at 
>>>> >>>> org.jboss.messaging.core.impl.postoffice.GroupMember.stop(GroupMember.java:225) 
>>>> >>>>
>>>> >>>>           at 
>>>> >>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.stop(MessagingPostOffice.java:425) 
>>>> >>>>
>>>> >>>>
>>>> >>>> Is this a known issue?
>>> >>>
>>> >>>
>> >>
>> >>


> Dead locak while simultaneously shutdown of nodes in a cluster
> --------------------------------------------------------------
>
>                 Key: JBMESSAGING-1796
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1796
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering
>    Affects Versions: 1.4.0.SP3.CP10, 1.4.6.GA
>            Reporter: Howard Gao
>            Assignee: Howard Gao
>             Fix For: 1.4.0.SP3.CP11, 1.4.6.GA.SP1
>
>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira