[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1796) Dead locak while simultaneously shutdown of nodes in a cluster
Howard Gao (JIRA)
jira-events at lists.jboss.org
Wed Mar 24 02:50:38 EDT 2010
[ https://jira.jboss.org/jira/browse/JBMESSAGING-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12521608#action_12521608 ]
Howard Gao commented on JBMESSAGING-1796:
-----------------------------------------
Bela Ban wrote:
> > I agree, the best thing to do here is to run calculateFailoverMap() on
> > a separate thread. It's a bad idea to (1) run long actionas or (2)
> > send messages inside of a viewAccepted() callback, as this blocks
> > JGroups.
> >
> > Brian Stansberry wrote:
>> >> So, input from Howard and Bela is needed. (I reattached the stack
>> >> traces.) The deadlock is:
>> >>
>> >> ShutdownHook: JBM locks MessagingPostOffice, later JGroups wants to
>> >> lock GMS.members
>> >>
>> >> JGroups Incoming Thread: JGroups locks GMS.members, later JBM wants
>> >> to lock MessagingPostOffice
>> >>
>> >> In neither place where the lock is taken does the code taking the
>> >> lock have any idea that later the other lock is wanted; i.e. it's not
>> >> a simple coding error.
>> >>
>> >> Howard, a general practice that I always encourage in JGroups apps is
>> >> to have another thread available to asynchronously process events
>> >> triggered by a JGroups MembershipListener.viewAccepted() callback. In
>> >> this case that would be
>> >> org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted.
>> >> Let the JGroups thread cache the membership information in some
>> >> object and then signal another thread to handle it. Let the JGroups
>> >> thread promptly return. A simple way to do that is encapsulate the
>> >> current viewAccepted logic in a anonymous Runnable and pass the
>> >> Runnable to an java.util.concurrent.ExecutorService to execute.
>> >>
>> >> If that's not doable, then on either the JGroups side or the JBM side
>> >> is going to need to reduce the scope of the locks. I doubt that's a
>> >> simple thing though for either side.
>> >>
>> >>
>> >>
>> >> On 02/26/2010 09:59 AM, Colin Mondesir wrote:
>>> >>> Hi Brian,
>>> >>>
>>> >>> See attached.
>>> >>>
>>> >>>
>>> >>> ----- Original Message -----
>>> >>> From: "Brian Stansberry"<brian.stansberry at redhat.com>
>>> >>> To: "Colin Mondesir"<cmondesi at redhat.com>
>>> >>> Cc: "jboss-support-clustering"<jboss-support-clustering at redhat.com>,
>>> >>> "jboss-support-messaging"<jboss-support-messaging at redhat.com>
>>> >>> Sent: Friday, February 26, 2010 2:27:12 PM GMT +00:00 GMT Britain,
>>> >>> Ireland, Portugal
>>> >>> Subject: Re: Deadlock on server shutdown (case #552253)
>>> >>>
>>> >>> No, it's not a known issue, at least not to me. What's the stack trace
>>> >>> for the other thread, Incoming-5,10.33.0.23:50125 ?
>>> >>>
>>> >>> On 02/26/2010 04:34 AM, Colin Mondesir wrote:
>>>> >>>> https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=552253&gid=1354
>>>> >>>>
>>>> >>>>
>>>> >>>> A customer is getting a deadlock when simultaneously shutting down
>>>> >>>> two clustered servers (EAP 5.0), one instance will shutdown
>>>> >>>> correctly but the other does not.
>>>> >>>>
>>>> >>>> Extract from a tread dump of the failed server:
>>>> >>>>
>>>> >>>> Found one Java-level deadlock:
>>>> >>>> =============================
>>>> >>>> "JBoss Shutdown Hook":
>>>> >>>> waiting to lock monitor 0x1182ebac (object 0x96b64c80, a
>>>> >>>> org.jgroups.Membership),
>>>> >>>> which is held by "Incoming-5,10.33.0.23:50125"
>>>> >>>> "Incoming-5,10.33.0.23:50125":
>>>> >>>> waiting to lock monitor 0x873b0fb0 (object 0x96abd160, a
>>>> >>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice),
>>>> >>>> which is held by "JBoss Shutdown Hook"
>>>> >>>>
>>>> >>>> Java stack information for the threads listed above:
>>>> >>>> ===================================================
>>>> >>>> "JBoss Shutdown Hook":
>>>> >>>> at
>>>> >>>> org.jgroups.protocols.pbcast.GMS.determineCoordinator(GMS.java:564)
>>>> >>>> - waiting to lock<0x96b64c80> (a org.jgroups.Membership)
>>>> >>>> at
>>>> >>>> org.jgroups.protocols.pbcast.ParticipantGmsImpl.leave(ParticipantGmsImpl.java:57)
>>>> >>>>
>>>> >>>> at org.jgroups.protocols.pbcast.GMS.down(GMS.java:886)
>>>> >>>> at org.jgroups.protocols.FC.down(FC.java:434)
>>>> >>>> at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
>>>> >>>> at
>>>> >>>> org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:209)
>>>> >>>>
>>>> >>>> at org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:291)
>>>> >>>> at
>>>> >>>> org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:461)
>>>> >>>> at org.jgroups.JChannel.down(JChannel.java:1514)
>>>> >>>> at org.jgroups.JChannel.disconnect(JChannel.java:500)
>>>> >>>> - locked<0x96ae96f0> (a org.jgroups.JChannel)
>>>> >>>> at org.jgroups.JChannel._close(JChannel.java:1730)
>>>> >>>> at org.jgroups.JChannel.close(JChannel.java:516)
>>>> >>>> - locked<0x96ae96f0> (a org.jgroups.JChannel)
>>>> >>>> at
>>>> >>>> org.jboss.messaging.core.impl.postoffice.GroupMember.stop(GroupMember.java:225)
>>>> >>>>
>>>> >>>> at
>>>> >>>> org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.stop(MessagingPostOffice.java:425)
>>>> >>>>
>>>> >>>>
>>>> >>>> Is this a known issue?
>>> >>>
>>> >>>
>> >>
>> >>
> Dead locak while simultaneously shutdown of nodes in a cluster
> --------------------------------------------------------------
>
> Key: JBMESSAGING-1796
> URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1796
> Project: JBoss Messaging
> Issue Type: Bug
> Components: JMS Clustering
> Affects Versions: 1.4.0.SP3.CP10, 1.4.6.GA
> Reporter: Howard Gao
> Assignee: Howard Gao
> Fix For: 1.4.0.SP3.CP11, 1.4.6.GA.SP1
>
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list