[jboss-jira] [JBoss JIRA] Commented: (JGRP-553) Deadlock in FD protocol
Bela Ban (JIRA)
jira-events at lists.jboss.org
Tue Jul 10 10:07:55 EDT 2007
[ http://jira.jboss.com/jira/browse/JGRP-553?page=comments#action_12368376 ]
Bela Ban commented on JGRP-553:
-------------------------------
Oops, actually, your solution works too: if I move startBroadcastTask() into the suspected_mbrs lock scope, then the order of lock acquisition for addSuspectedMember() is:
suspected_mbrs --> bcast_mutex --> suspected_mbrs (already held)
> Deadlock in FD protocol
> -----------------------
>
> Key: JGRP-553
> URL: http://jira.jboss.com/jira/browse/JGRP-553
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assigned To: Bela Ban
> Fix For: 2.5, 2.4.1 SP4
>
>
> [submitted by Oleg Afanasiev (xbalanque at mail.ru)]
> I found the following deadlock in protocol FD:
> Found one Java-level deadlock:
> =============================
> "IncomingPacketHandler (channel=+slee_5servers)":
> waiting to lock monitor 0x0000002b80fcd488 (object 0x0000002aa7788330, a java.lang.Object),
> which is held by "Timer-4"
> "Timer-4":
> waiting to lock monitor 0x0000002b80fcd848 (object 0x0000002aa778adc8, a java.util.Vector),
> which is held by "IncomingPacketHandler (channel=+slee_5servers)"
> Java stack information for the threads listed above:
> ===================================================
> "IncomingPacketHandler (channel=+slee_5servers)":
> at org.jgroups.protocols.FD$Broadcaster.stopBroadcastTask (FD.java:602)
> - waiting to lock <0x0000002aa7788330> (a java.lang.Object)
> at org.jgroups.protocols.FD$Broadcaster.adjustSuspectedMembers (FD.java:649)
> - locked <0x0000002aa778adc8> (a java.util.Vector)
> at org.jgroups.protocols.FD.down(FD.java:318)
> - locked <0x0000002aa7083c08> (a org.jgroups.protocols.FD)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.stack.Protocol.down(Protocol.java:559)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:480)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.UNICAST.down(UNICAST.java:391)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:287)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:166)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:509)
> - locked <0x0000002aa76ee7e0> (a org.jgroups.Membership)
> at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:422)
> at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange (CoordGmsImpl.java:568)
> ...
> "Timer-4":
> at java.util.Vector.clone(Vector.java:638)
> - waiting to lock <0x0000002aa778adc8> (a java.util.Vector)
> at org.jgroups.protocols.FD$Broadcaster.startBroadcastTask (FD.java:587)
> - locked <0x0000002aa7788330> (a java.lang.Object)
> at org.jgroups.protocols.FD$Broadcaster.addSuspectedMember (FD.java:621)
> at org.jgroups.protocols.FD$Monitor.run(FD.java:541)
> at org.jgroups.util.TimeScheduler$TaskWrapper.run (TimeScheduler.java:204)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> The source of deadlock is the following piece of code:
> If addSuspectedMember from own timer task and adjustSuspectedMembers
> from GMS arrive simultaneously, we've got a deadlock on method clone().
> protected final class Broadcaster {
> final Vector suspected_mbrs=new Vector(7);
> BroadcastTask task=null;
> private final Object bcast_mutex=new Object();
> Vector getSuspectedMembers() {
> return suspected_mbrs;
> }
> /**
> * Starts a new task, or - if already running - adds the
> argument to the running task.
> * @param suspect
> */
> private void startBroadcastTask(Address suspect) {
> synchronized(bcast_mutex) {
> if(task == null || task.cancelled()) {
> task=new BroadcastTask((Vector)
> suspected_mbrs.clone()); /// this clone method is synchronized and
> causes a deadlock
> task.addSuspectedMember(suspect);
> task.run(); // run immediately the first time
> timer.add(task); // then every timeout
> milliseconds, until cancelled
> if(log.isTraceEnabled())
> log.trace("BroadcastTask started");
> }
> else {
> task.addSuspectedMember(suspect);
> }
> }
> }
> private void stopBroadcastTask() {
> synchronized(bcast_mutex) {
> if(task != null) {
> task.stop();
> task=null;
> }
> }
> }
> /** Adds a suspected member. Starts the task if not yet
> running */
> protected void addSuspectedMember(Address mbr) {
> if(mbr == null) return;
> if(!members.contains(mbr)) return;
> boolean added=false;
> synchronized(suspected_mbrs) {
> if(!suspected_mbrs.contains(mbr)) {
> suspected_mbrs.addElement(mbr);
> added=true;
> }
> }
> if(added)
> startBroadcastTask(mbr);
> }
> void removeSuspectedMember(Address suspected_mbr) {
> if(suspected_mbr == null) return;
> if(log.isDebugEnabled()) log.debug("member is " +
> suspected_mbr);
> synchronized(suspected_mbrs) {
> suspected_mbrs.removeElement(suspected_mbr);
> if(suspected_mbrs.isEmpty())
> stopBroadcastTask();
> }
> }
> void removeAll() {
> synchronized(suspected_mbrs) {
> suspected_mbrs.removeAllElements();
> stopBroadcastTask();
> }
> }
> /** Removes all elements from suspected_mbrs that are
> <em>not</em> in the new membership */
> void adjustSuspectedMembers(List new_mbrship) {
> if(new_mbrship == null || new_mbrship.isEmpty()) return;
> StringBuffer sb=new StringBuffer();
> synchronized(suspected_mbrs) {
> sb.append("suspected_mbrs: ").append(suspected_mbrs);
> suspected_mbrs.retainAll(new_mbrship);
> if(suspected_mbrs.isEmpty())
> stopBroadcastTask();
> sb.append(", after adjustment: ").append
> (suspected_mbrs);
> log.debug(sb.toString());
> }
> }
> }
> The proposed solution is to move startBroadcastTask() back inside
> synchronized(suspected_mbrs)
> block in the addSuspectedMember method. So it'll look like:
> /** Adds a suspected member. Starts the task if not yet
> running */
> protected void addSuspectedMember(Address mbr) {
> if(mbr == null) return;
> if(!members.contains(mbr)) return;
> boolean added=false;
> synchronized(suspected_mbrs) {
> if(!suspected_mbrs.contains(mbr)) {
> suspected_mbrs.addElement(mbr);
> startBroadcastTask(mbr);
> }
> }
> }
> Or we may use a separate monitor for operations that use
> suspected_mbrs to synchronize.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list