[jboss-jira] [JBoss JIRA] Commented: (JGRP-553) Deadlock in FD protocol

Bela Ban (JIRA) jira-events at lists.jboss.org
Tue Jul 10 10:07:55 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-553?page=comments#action_12368376 ] 
            
Bela Ban commented on JGRP-553:
-------------------------------

Oops, actually, your solution works too: if I move startBroadcastTask() into the suspected_mbrs lock scope, then the order of lock acquisition for addSuspectedMember() is:
suspected_mbrs --> bcast_mutex --> suspected_mbrs (already held)



> Deadlock in FD protocol
> -----------------------
>
>                 Key: JGRP-553
>                 URL: http://jira.jboss.com/jira/browse/JGRP-553
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>         Assigned To: Bela Ban
>             Fix For: 2.5, 2.4.1 SP4
>
>
> [submitted by Oleg Afanasiev (xbalanque at mail.ru)]
> I found the following deadlock in protocol FD:
> Found one Java-level deadlock:
> =============================
> "IncomingPacketHandler (channel=+slee_5servers)":
>    waiting to lock monitor 0x0000002b80fcd488 (object 0x0000002aa7788330, a java.lang.Object),
>    which is held by "Timer-4"
> "Timer-4":
>    waiting to lock monitor 0x0000002b80fcd848 (object 0x0000002aa778adc8, a java.util.Vector),
>    which is held by "IncomingPacketHandler (channel=+slee_5servers)"
> Java stack information for the threads listed above:
> ===================================================
> "IncomingPacketHandler (channel=+slee_5servers)":
>          at org.jgroups.protocols.FD$Broadcaster.stopBroadcastTask (FD.java:602)
>          - waiting to lock <0x0000002aa7788330> (a java.lang.Object)
>          at org.jgroups.protocols.FD$Broadcaster.adjustSuspectedMembers (FD.java:649)
>          - locked <0x0000002aa778adc8> (a java.util.Vector)
>          at org.jgroups.protocols.FD.down(FD.java:318)
>          - locked <0x0000002aa7083c08> (a org.jgroups.protocols.FD)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.stack.Protocol.down(Protocol.java:559)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:480)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.protocols.UNICAST.down(UNICAST.java:391)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:287)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:166)
>          at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
>          at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
>          at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:509)
>          - locked <0x0000002aa76ee7e0> (a org.jgroups.Membership)
>          at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:422)
>          at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange (CoordGmsImpl.java:568)
>          ...
> "Timer-4":
>          at java.util.Vector.clone(Vector.java:638)
>          - waiting to lock <0x0000002aa778adc8> (a java.util.Vector)
>          at org.jgroups.protocols.FD$Broadcaster.startBroadcastTask (FD.java:587)
>          - locked <0x0000002aa7788330> (a java.lang.Object)
>          at org.jgroups.protocols.FD$Broadcaster.addSuspectedMember (FD.java:621)
>          at org.jgroups.protocols.FD$Monitor.run(FD.java:541)
>          at org.jgroups.util.TimeScheduler$TaskWrapper.run (TimeScheduler.java:204)
>          at java.util.TimerThread.mainLoop(Timer.java:512)
>          at java.util.TimerThread.run(Timer.java:462)
> The source of deadlock is the following piece of code:
> If addSuspectedMember from own timer task and adjustSuspectedMembers  
> from GMS arrive simultaneously, we've got a deadlock on method clone().
>      protected final class Broadcaster {
>          final Vector suspected_mbrs=new Vector(7);
>          BroadcastTask task=null;
>          private final Object bcast_mutex=new Object();
>          Vector getSuspectedMembers() {
>              return suspected_mbrs;
>          }
>          /**
>           * Starts a new task, or - if already running - adds the
> argument to the running task.
>           * @param suspect
>           */
>          private void startBroadcastTask(Address suspect) {
>              synchronized(bcast_mutex) {
>                  if(task == null || task.cancelled()) {
>                      task=new BroadcastTask((Vector) 
> suspected_mbrs.clone());    /// this clone method is synchronized and  
> causes a deadlock
>                      task.addSuspectedMember(suspect);
>                      task.run();      // run immediately the first time
>                      timer.add(task); // then every timeout
> milliseconds, until cancelled
>                      if(log.isTraceEnabled())
>                          log.trace("BroadcastTask started");
>                  }
>                  else {
>                      task.addSuspectedMember(suspect);
>                  }
>              }
>          }
>          private void stopBroadcastTask() {
>              synchronized(bcast_mutex) {
>                  if(task != null) {
>                      task.stop();
>                      task=null;
>                  }
>              }
>          }
>          /** Adds a suspected member. Starts the task if not yet  
> running */
>          protected void addSuspectedMember(Address mbr) {
>              if(mbr == null) return;
>              if(!members.contains(mbr)) return;
>              boolean added=false;
>              synchronized(suspected_mbrs) {
>                  if(!suspected_mbrs.contains(mbr)) {
>                      suspected_mbrs.addElement(mbr);
>                      added=true;
>                  }
>              }
>              if(added)
>                  startBroadcastTask(mbr);
>          }
>          void removeSuspectedMember(Address suspected_mbr) {
>              if(suspected_mbr == null) return;
>              if(log.isDebugEnabled()) log.debug("member is " +
> suspected_mbr);
>              synchronized(suspected_mbrs) {
>                  suspected_mbrs.removeElement(suspected_mbr);
>                  if(suspected_mbrs.isEmpty())
>                      stopBroadcastTask();
>              }
>          }
>          void removeAll() {
>              synchronized(suspected_mbrs) {
>                  suspected_mbrs.removeAllElements();
>                  stopBroadcastTask();
>              }
>          }
>          /** Removes all elements from suspected_mbrs that are
> <em>not</em> in the new membership */
>          void adjustSuspectedMembers(List new_mbrship) {
>              if(new_mbrship == null || new_mbrship.isEmpty()) return;
>              StringBuffer sb=new StringBuffer();
>              synchronized(suspected_mbrs) {
>                  sb.append("suspected_mbrs: ").append(suspected_mbrs);
>                  suspected_mbrs.retainAll(new_mbrship);
>                  if(suspected_mbrs.isEmpty())
>                      stopBroadcastTask();
>                  sb.append(", after adjustment: ").append 
> (suspected_mbrs);
>                  log.debug(sb.toString());
>              }
>          }
>      }
> The proposed solution is to move startBroadcastTask() back inside  
> synchronized(suspected_mbrs)
> block in the addSuspectedMember method. So it'll look like:
>          /** Adds a suspected member. Starts the task if not yet  
> running */
>          protected void addSuspectedMember(Address mbr) {
>              if(mbr == null) return;
>              if(!members.contains(mbr)) return;
>              boolean added=false;
>              synchronized(suspected_mbrs) {
>                  if(!suspected_mbrs.contains(mbr)) {
>                      suspected_mbrs.addElement(mbr);
>                      startBroadcastTask(mbr);
>                  }
>              }
>          }
> Or we may use a separate monitor for operations that use  
> suspected_mbrs to synchronize.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list