[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1796) Dead locak while simultaneously shutdown of nodes in a cluster

Wednesday, 24 March 2010



    [
https://jira.jboss.org/jira/browse/JBMESSAGING-1796?page=com.atlassian.ji...
] 

Howard Gao commented on JBMESSAGING-1796:
-----------------------------------------

Bela Ban wrote:
...
 > I agree, the best thing to do here is to run
calculateFailoverMap() on 
 > a separate thread. It's a bad idea to (1) run long actionas or (2) 
 > send messages inside of a viewAccepted() callback, as this blocks 
 > JGroups.
 >
 > Brian Stansberry wrote:
> >> So, input from Howard and Bela is needed. (I reattached the stack 
> >> traces.)  The deadlock is:
> >>
> >> ShutdownHook: JBM locks MessagingPostOffice, later JGroups wants to 
> >> lock GMS.members
> >>
> >> JGroups Incoming Thread: JGroups locks GMS.members, later JBM wants 
> >> to lock MessagingPostOffice
> >>
> >> In neither place where the lock is taken does the code taking the 
> >> lock have any idea that later the other lock is wanted; i.e. it's not 
> >> a simple coding error.
> >>
> >> Howard, a general practice that I always encourage in JGroups apps is 
> >> to have another thread available to asynchronously process events 
> >> triggered by a JGroups MembershipListener.viewAccepted() callback. In 
> >> this case that would be 
> >>
org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted.

> >> Let the JGroups thread cache the membership information in some 
> >> object and then signal another thread to handle it. Let the JGroups 
> >> thread promptly return. A simple way to do that is encapsulate the 
> >> current viewAccepted logic in a anonymous Runnable and pass the 
> >> Runnable to an java.util.concurrent.ExecutorService to execute.
> >>
> >> If that's not doable, then on either the JGroups side or the JBM side 
> >> is going to need to reduce the scope of the locks. I doubt that's a 
> >> simple thing though for either side.
> >>
> >>
> >>
> >> On 02/26/2010 09:59 AM, Colin Mondesir wrote:
>> >>> Hi Brian,
>> >>>
>> >>> See attached.
>> >>>
>> >>>
>> >>> ----- Original Message -----
>> >>> From: "Brian
Stansberry&quot;&lt;brian.stansberry(a)redhat.com&gt;
>> >>> To: "Colin Mondesir&quot;&lt;cmondesi(a)redhat.com&gt;
>> >>> Cc:
&quot;jboss-support-clustering&quot;&lt;jboss-support-clustering(a)redhat.com&gt;, 
>> >>>
&quot;jboss-support-messaging&quot;&lt;jboss-support-messaging(a)redhat.com&gt;
>> >>> Sent: Friday, February 26, 2010 2:27:12 PM GMT +00:00 GMT Britain, 
>> >>> Ireland, Portugal
>> >>> Subject: Re: Deadlock on server shutdown (case #552253)
>> >>>
>> >>> No, it's not a known issue, at least not to me.  What's the
stack trace
>> >>> for the other thread, Incoming-5,10.33.0.23:50125 ?
>> >>>
>> >>> On 02/26/2010 04:34 AM, Colin Mondesir wrote:
>>> >>>>
https://enterprise.redhat.com/issue-tracker/?module=issues&action=vie...

>>> >>>>
>>> >>>>
>>> >>>> A customer is getting a deadlock when simultaneously
shutting down 
>>> >>>> two clustered servers (EAP 5.0), one instance will shutdown

>>> >>>> correctly but the other does not.
>>> >>>>
>>> >>>> Extract from a tread dump of the failed server:
>>> >>>>
>>> >>>> Found one Java-level deadlock:
>>> >>>> =============================
>>> >>>> "JBoss Shutdown Hook":
>>> >>>>     waiting to lock monitor 0x1182ebac (object 0x96b64c80, a

>>> >>>> org.jgroups.Membership),
>>> >>>>     which is held by
"Incoming-5,10.33.0.23:50125"
>>> >>>> "Incoming-5,10.33.0.23:50125":
>>> >>>>     waiting to lock monitor 0x873b0fb0 (object 0x96abd160, a

>>> >>>>
org.jboss.messaging.core.impl.postoffice.MessagingPostOffice),
>>> >>>>     which is held by "JBoss Shutdown Hook"
>>> >>>>
>>> >>>> Java stack information for the threads listed above:
>>> >>>> ===================================================
>>> >>>> "JBoss Shutdown Hook":
>>> >>>>           at 
>>> >>>>
org.jgroups.protocols.pbcast.GMS.determineCoordinator(GMS.java:564)
>>> >>>>           - waiting to lock<0x96b64c80>   (a
org.jgroups.Membership)
>>> >>>>           at 
>>> >>>>
org.jgroups.protocols.pbcast.ParticipantGmsImpl.leave(ParticipantGmsImpl.java:57) 
>>> >>>>
>>> >>>>           at
org.jgroups.protocols.pbcast.GMS.down(GMS.java:886)
>>> >>>>           at org.jgroups.protocols.FC.down(FC.java:434)
>>> >>>>           at
org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
>>> >>>>           at 
>>> >>>>
org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:209) 
>>> >>>>
>>> >>>>           at
org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:291)
>>> >>>>           at 
>>> >>>>
org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:461)
>>> >>>>           at org.jgroups.JChannel.down(JChannel.java:1514)
>>> >>>>           at
org.jgroups.JChannel.disconnect(JChannel.java:500)
>>> >>>>           - locked<0x96ae96f0>   (a
org.jgroups.JChannel)
>>> >>>>           at
org.jgroups.JChannel._close(JChannel.java:1730)
>>> >>>>           at org.jgroups.JChannel.close(JChannel.java:516)
>>> >>>>           - locked<0x96ae96f0>   (a
org.jgroups.JChannel)
>>> >>>>           at 
>>> >>>>
org.jboss.messaging.core.impl.postoffice.GroupMember.stop(GroupMember.java:225) 
>>> >>>>
>>> >>>>           at 
>>> >>>>
org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.stop(MessagingPostOffice.java:425)

>>> >>>>
>>> >>>>
>>> >>>> Is this a known issue?
>> >>>
>> >>>
> >>
> >> 

...
 Dead locak while simultaneously shutdown of nodes in a cluster
 --------------------------------------------------------------

                 Key: JBMESSAGING-1796
                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1796
             Project: JBoss Messaging
          Issue Type: Bug
          Components: JMS Clustering
    Affects Versions: 1.4.0.SP3.CP10, 1.4.6.GA
            Reporter: Howard Gao
            Assignee: Howard Gao
             Fix For: 1.4.0.SP3.CP11, 1.4.6.GA.SP1

 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1796) Dead locak while simultaneously shutdown of nodes in a cluster