[jboss-jira] [JBoss JIRA] Updated: (JGRP-750) Deadlock between GroupRequest and FLUSH during concurrent startup.

Thu May 1 06:38:45 EDT 2008

     [ http://jira.jboss.com/jira/browse/JGRP-750?page=all ]

Robert Newson updated JGRP-750:
-------------------------------

    Attachment: ConcurrentStartupWithGroupRequestTest.java

Updated version. I had the connect() inside the loop rather than outside. It still fails, sadly.

There's a Thread.sleep line that is commented out. If you uncomment it, the test passes.

> Deadlock between GroupRequest and FLUSH during concurrent startup.
> ------------------------------------------------------------------
>
>                 Key: JGRP-750
>                 URL: http://jira.jboss.com/jira/browse/JGRP-750
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6.3
>         Environment: Debian etch (i386), Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
>            Reporter: Robert Newson
>         Assigned To: Bela Ban
>         Attachments: ConcurrentStartupWithGroupRequestTest.java, ConcurrentStartupWithGroupRequestTest.java, stacktrace.txt
>
>
> We've been having more trouble with concurrent start up and now think
> we've isolated a deadlock between FLUSH and GroupRequest during
> concurrent startup.
> We have four boxes that join a channel and use MessageDispatcher
> immediately after connecting. This frequently blocks indefinitely.
> GroupRequest.execute() obtains a lock, then a subsequent view change
> comes in which does likewise. The upshot is that we can see all
> Incoming threads are blocked for the lock and the only way it can be
> released is for a stop_flush message to occur. With all incoming
> threads blocked, that never happens.
> In the attached unit test if you add this after the call to connect("A"), it passes, implying a deadlock;
> if (j ==0) {
>   Thread.sleep(500);
> }
> Additionally, and this is more speculative, it seems the wait/notify code in pbcast does not account for the spurious wakeup case. I don't know under what circumstances they happen, and I don't believe we're seeing spurious wakes at this time, but it should be fixed at some stage.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira