[jboss-jira] [JBoss JIRA] Created: (JGRP-750) Deadlock between GroupRequest and FLUSH during concurrent startup.
Robert Newson (JIRA)
jira-events at lists.jboss.org
Thu May 1 06:30:18 EDT 2008
Deadlock between GroupRequest and FLUSH during concurrent startup.
------------------------------------------------------------------
Key: JGRP-750
URL: http://jira.jboss.com/jira/browse/JGRP-750
Project: JGroups
Issue Type: Bug
Affects Versions: 2.6.3
Environment: Debian etch (i386), Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
Reporter: Robert Newson
Assigned To: Bela Ban
We've been having more trouble with concurrent start up and now think
we've isolated a deadlock between FLUSH and GroupRequest during
concurrent startup.
We have four boxes that join a channel and use MessageDispatcher
immediately after connecting. This frequently blocks indefinitely.
GroupRequest.execute() obtains a lock, then a subsequent view change
comes in which does likewise. The upshot is that we can see all
Incoming threads are blocked for the lock and the only way it can be
released is for a stop_flush message to occur. With all incoming
threads blocked, that never happens.
In the attached unit test if you add this after the call to connect("A"), it passes, implying a deadlock;
if (j ==0) {
Thread.sleep(500);
}
Additionally, and this is more speculative, it seems the wait/notify code in pbcast does not account for the spurious wakeup case. I don't know under what circumstances they happen, and I don't believe we're seeing spurious wakes at this time, but it should be fixed at some stage.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list