[jboss-jira] [JBoss JIRA] Created: (JGRP-553) Deadlock in FD protocol

Bela Ban (JIRA) jira-events at lists.jboss.org
Tue Jul 10 09:23:48 EDT 2007


Deadlock in FD protocol
-----------------------

                 Key: JGRP-553
                 URL: http://jira.jboss.com/jira/browse/JGRP-553
             Project: JGroups
          Issue Type: Bug
            Reporter: Bela Ban
         Assigned To: Bela Ban
             Fix For: 2.4.1 SP4, 2.5


[submitted by Oleg Afanasiev (xbalanque at mail.ru)]

I found the following deadlock in protocol FD:

Found one Java-level deadlock:
=============================
"IncomingPacketHandler (channel=+slee_5servers)":
   waiting to lock monitor 0x0000002b80fcd488 (object 0x0000002aa7788330, a java.lang.Object),
   which is held by "Timer-4"

"Timer-4":
   waiting to lock monitor 0x0000002b80fcd848 (object 0x0000002aa778adc8, a java.util.Vector),
   which is held by "IncomingPacketHandler (channel=+slee_5servers)"

Java stack information for the threads listed above:
===================================================
"IncomingPacketHandler (channel=+slee_5servers)":
         at org.jgroups.protocols.FD$Broadcaster.stopBroadcastTask (FD.java:602)
         - waiting to lock <0x0000002aa7788330> (a java.lang.Object)
         at org.jgroups.protocols.FD$Broadcaster.adjustSuspectedMembers (FD.java:649)
         - locked <0x0000002aa778adc8> (a java.util.Vector)
         at org.jgroups.protocols.FD.down(FD.java:318)
         - locked <0x0000002aa7083c08> (a org.jgroups.protocols.FD)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.stack.Protocol.down(Protocol.java:559)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:480)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.protocols.UNICAST.down(UNICAST.java:391)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:287)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:166)
         at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java: 499)
         at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
         at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:509)
         - locked <0x0000002aa76ee7e0> (a org.jgroups.Membership)
         at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:422)
         at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange (CoordGmsImpl.java:568)
         ...

"Timer-4":
         at java.util.Vector.clone(Vector.java:638)
         - waiting to lock <0x0000002aa778adc8> (a java.util.Vector)
         at org.jgroups.protocols.FD$Broadcaster.startBroadcastTask (FD.java:587)
         - locked <0x0000002aa7788330> (a java.lang.Object)
         at org.jgroups.protocols.FD$Broadcaster.addSuspectedMember (FD.java:621)
         at org.jgroups.protocols.FD$Monitor.run(FD.java:541)
         at org.jgroups.util.TimeScheduler$TaskWrapper.run (TimeScheduler.java:204)
         at java.util.TimerThread.mainLoop(Timer.java:512)
         at java.util.TimerThread.run(Timer.java:462)


The source of deadlock is the following piece of code:
If addSuspectedMember from own timer task and adjustSuspectedMembers  
from GMS arrive simultaneously, we've got a deadlock on method clone().

     protected final class Broadcaster {
         final Vector suspected_mbrs=new Vector(7);
         BroadcastTask task=null;
         private final Object bcast_mutex=new Object();


         Vector getSuspectedMembers() {
             return suspected_mbrs;
         }

         /**
          * Starts a new task, or - if already running - adds the
argument to the running task.
          * @param suspect
          */
         private void startBroadcastTask(Address suspect) {
             synchronized(bcast_mutex) {
                 if(task == null || task.cancelled()) {
                     task=new BroadcastTask((Vector) 
suspected_mbrs.clone());    /// this clone method is synchronized and  
causes a deadlock
                     task.addSuspectedMember(suspect);
                     task.run();      // run immediately the first time
                     timer.add(task); // then every timeout
milliseconds, until cancelled
                     if(log.isTraceEnabled())
                         log.trace("BroadcastTask started");
                 }
                 else {
                     task.addSuspectedMember(suspect);
                 }
             }
         }

         private void stopBroadcastTask() {
             synchronized(bcast_mutex) {
                 if(task != null) {
                     task.stop();
                     task=null;
                 }
             }
         }

         /** Adds a suspected member. Starts the task if not yet  
running */
         protected void addSuspectedMember(Address mbr) {
             if(mbr == null) return;
             if(!members.contains(mbr)) return;
             boolean added=false;
             synchronized(suspected_mbrs) {
                 if(!suspected_mbrs.contains(mbr)) {
                     suspected_mbrs.addElement(mbr);
                     added=true;
                 }
             }
             if(added)
                 startBroadcastTask(mbr);
         }

         void removeSuspectedMember(Address suspected_mbr) {
             if(suspected_mbr == null) return;
             if(log.isDebugEnabled()) log.debug("member is " +
suspected_mbr);
             synchronized(suspected_mbrs) {
                 suspected_mbrs.removeElement(suspected_mbr);
                 if(suspected_mbrs.isEmpty())
                     stopBroadcastTask();
             }
         }

         void removeAll() {
             synchronized(suspected_mbrs) {
                 suspected_mbrs.removeAllElements();
                 stopBroadcastTask();
             }
         }

         /** Removes all elements from suspected_mbrs that are
<em>not</em> in the new membership */
         void adjustSuspectedMembers(List new_mbrship) {
             if(new_mbrship == null || new_mbrship.isEmpty()) return;
             StringBuffer sb=new StringBuffer();
             synchronized(suspected_mbrs) {
                 sb.append("suspected_mbrs: ").append(suspected_mbrs);
                 suspected_mbrs.retainAll(new_mbrship);
                 if(suspected_mbrs.isEmpty())
                     stopBroadcastTask();
                 sb.append(", after adjustment: ").append 
(suspected_mbrs);
                 log.debug(sb.toString());
             }
         }
     }


The proposed solution is to move startBroadcastTask() back inside  
synchronized(suspected_mbrs)
block in the addSuspectedMember method. So it'll look like:

         /** Adds a suspected member. Starts the task if not yet  
running */
         protected void addSuspectedMember(Address mbr) {
             if(mbr == null) return;
             if(!members.contains(mbr)) return;
             boolean added=false;
             synchronized(suspected_mbrs) {
                 if(!suspected_mbrs.contains(mbr)) {
                     suspected_mbrs.addElement(mbr);
                     startBroadcastTask(mbr);
                 }
             }
         }

Or we may use a separate monitor for operations that use  
suspected_mbrs to synchronize.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list