[jboss-jira] [JBoss JIRA] Created: (JGRP-750) Deadlock between GroupRequest and FLUSH during concurrent startup.

Thu May 1 06:30:18 EDT 2008

Deadlock between GroupRequest and FLUSH during concurrent startup.
------------------------------------------------------------------

                 Key: JGRP-750
                 URL: http://jira.jboss.com/jira/browse/JGRP-750
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.6.3
         Environment: Debian etch (i386), Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
            Reporter: Robert Newson
         Assigned To: Bela Ban

We've been having more trouble with concurrent start up and now think
we've isolated a deadlock between FLUSH and GroupRequest during
concurrent startup.

We have four boxes that join a channel and use MessageDispatcher
immediately after connecting. This frequently blocks indefinitely.

GroupRequest.execute() obtains a lock, then a subsequent view change
comes in which does likewise. The upshot is that we can see all
Incoming threads are blocked for the lock and the only way it can be
released is for a stop_flush message to occur. With all incoming
threads blocked, that never happens.

In the attached unit test if you add this after the call to connect("A"), it passes, implying a deadlock;

if (j ==0) {
  Thread.sleep(500);
}

Additionally, and this is more speculative, it seems the wait/notify code in pbcast does not account for the spurious wakeup case. I don't know under what circumstances they happen, and I don't believe we're seeing spurious wakes at this time, but it should be fixed at some stage.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira