]
Howard Gao commented on JBMESSAGING-1796:
-----------------------------------------
Bela Ban wrote:
> I agree, the best thing to do here is to run
calculateFailoverMap() on
> a separate thread. It's a bad idea to (1) run long actionas or (2)
> send messages inside of a viewAccepted() callback, as this blocks
> JGroups.
>
> Brian Stansberry wrote:
> >> So, input from Howard and Bela is needed. (I reattached the stack
> >> traces.) The deadlock is:
> >>
> >> ShutdownHook: JBM locks MessagingPostOffice, later JGroups wants to
> >> lock GMS.members
> >>
> >> JGroups Incoming Thread: JGroups locks GMS.members, later JBM wants
> >> to lock MessagingPostOffice
> >>
> >> In neither place where the lock is taken does the code taking the
> >> lock have any idea that later the other lock is wanted; i.e. it's not
> >> a simple coding error.
> >>
> >> Howard, a general practice that I always encourage in JGroups apps is
> >> to have another thread available to asynchronously process events
> >> triggered by a JGroups MembershipListener.viewAccepted() callback. In
> >> this case that would be
> >>
org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted.
> >> Let the JGroups thread cache the membership information in some
> >> object and then signal another thread to handle it. Let the JGroups
> >> thread promptly return. A simple way to do that is encapsulate the
> >> current viewAccepted logic in a anonymous Runnable and pass the
> >> Runnable to an java.util.concurrent.ExecutorService to execute.
> >>
> >> If that's not doable, then on either the JGroups side or the JBM side
> >> is going to need to reduce the scope of the locks. I doubt that's a
> >> simple thing though for either side.
> >>
> >>
> >>
> >> On 02/26/2010 09:59 AM, Colin Mondesir wrote:
>> >>> Hi Brian,
>> >>>
>> >>> See attached.
>> >>>
>> >>>
>> >>> ----- Original Message -----
>> >>> From: "Brian
Stansberry"<brian.stansberry(a)redhat.com>
>> >>> To: "Colin Mondesir"<cmondesi(a)redhat.com>
>> >>> Cc:
"jboss-support-clustering"<jboss-support-clustering(a)redhat.com>,
>> >>>
"jboss-support-messaging"<jboss-support-messaging(a)redhat.com>
>> >>> Sent: Friday, February 26, 2010 2:27:12 PM GMT +00:00 GMT Britain,
>> >>> Ireland, Portugal
>> >>> Subject: Re: Deadlock on server shutdown (case #552253)
>> >>>
>> >>> No, it's not a known issue, at least not to me. What's the
stack trace
>> >>> for the other thread, Incoming-5,10.33.0.23:50125 ?
>> >>>
>> >>> On 02/26/2010 04:34 AM, Colin Mondesir wrote:
>>> >>>>
https://enterprise.redhat.com/issue-tracker/?module=issues&action=vie...
>>> >>>>
>>> >>>>
>>> >>>> A customer is getting a deadlock when simultaneously
shutting down
>>> >>>> two clustered servers (EAP 5.0), one instance will shutdown
>>> >>>> correctly but the other does not.
>>> >>>>
>>> >>>> Extract from a tread dump of the failed server:
>>> >>>>
>>> >>>> Found one Java-level deadlock:
>>> >>>> =============================
>>> >>>> "JBoss Shutdown Hook":
>>> >>>> waiting to lock monitor 0x1182ebac (object 0x96b64c80, a
>>> >>>> org.jgroups.Membership),
>>> >>>> which is held by
"Incoming-5,10.33.0.23:50125"
>>> >>>> "Incoming-5,10.33.0.23:50125":
>>> >>>> waiting to lock monitor 0x873b0fb0 (object 0x96abd160, a
>>> >>>>
org.jboss.messaging.core.impl.postoffice.MessagingPostOffice),
>>> >>>> which is held by "JBoss Shutdown Hook"
>>> >>>>
>>> >>>> Java stack information for the threads listed above:
>>> >>>> ===================================================
>>> >>>> "JBoss Shutdown Hook":
>>> >>>> at
>>> >>>>
org.jgroups.protocols.pbcast.GMS.determineCoordinator(GMS.java:564)
>>> >>>> - waiting to lock<0x96b64c80> (a
org.jgroups.Membership)
>>> >>>> at
>>> >>>>
org.jgroups.protocols.pbcast.ParticipantGmsImpl.leave(ParticipantGmsImpl.java:57)
>>> >>>>
>>> >>>> at
org.jgroups.protocols.pbcast.GMS.down(GMS.java:886)
>>> >>>> at org.jgroups.protocols.FC.down(FC.java:434)
>>> >>>> at
org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
>>> >>>> at
>>> >>>>
org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:209)
>>> >>>>
>>> >>>> at
org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:291)
>>> >>>> at
>>> >>>>
org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:461)
>>> >>>> at org.jgroups.JChannel.down(JChannel.java:1514)
>>> >>>> at
org.jgroups.JChannel.disconnect(JChannel.java:500)
>>> >>>> - locked<0x96ae96f0> (a
org.jgroups.JChannel)
>>> >>>> at
org.jgroups.JChannel._close(JChannel.java:1730)
>>> >>>> at org.jgroups.JChannel.close(JChannel.java:516)
>>> >>>> - locked<0x96ae96f0> (a
org.jgroups.JChannel)
>>> >>>> at
>>> >>>>
org.jboss.messaging.core.impl.postoffice.GroupMember.stop(GroupMember.java:225)
>>> >>>>
>>> >>>> at
>>> >>>>
org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.stop(MessagingPostOffice.java:425)
>>> >>>>
>>> >>>>
>>> >>>> Is this a known issue?
>> >>>
>> >>>
> >>
> >>
Dead locak while simultaneously shutdown of nodes in a cluster
--------------------------------------------------------------
Key: JBMESSAGING-1796
URL:
https://jira.jboss.org/jira/browse/JBMESSAGING-1796
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.0.SP3.CP10, 1.4.6.GA
Reporter: Howard Gao
Assignee: Howard Gao
Fix For: 1.4.0.SP3.CP11, 1.4.6.GA.SP1
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: