[jboss-jira] [JBoss JIRA] Commented: (JGRP-668) Deadlock condition in BARRIER
Bela Ban (JIRA)
jira-events at lists.jboss.org
Thu Feb 28 08:21:42 EST 2008
[ http://jira.jboss.com/jira/browse/JGRP-668?page=comments#action_12400827 ]
Bela Ban commented on JGRP-668:
-------------------------------
There is a unit test for the 2nd case: BARRIERTest.testThreadsBlockedOnMutex()
> Deadlock condition in BARRIER
> -----------------------------
>
> Key: JGRP-668
> URL: http://jira.jboss.com/jira/browse/JGRP-668
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.6.1
> Reporter: Bela Ban
> Assigned To: Bela Ban
> Fix For: 2.7, 2.6.3
>
> Attachments: BARRIER.java.patch, barrier_deadlock.txt, jgroups_protocol.xml
>
>
> Hey Bela et al:
> We've been fighting a lossy network (UDP receive errors) on a cluster of 50
> machines and managed to produce 2 coordinators who refused to MERGE3. A
> closer examination reviewed
> http://www.nabble.com/file/p14991972/barrier_deadlock.txt this stack trace
> which showed that there was one thread trying to satisfy a STATE_REQ msg
> blocked down in BARRIER.closeBarrier() waiting for the in_flight_threads to
> empty, another thread was trying to service another STATE_REQ and was
> blocked trying to lock the state_requesters table up in
> STATE_TRANSFER.handleStateReq(), and another 12 threads blocked waiting for
> the barrier to open in BARRIER.up().
> We quickly found that we had a deadlock condition in BARRIER that was
> problematic -- http://www.nabble.com/file/p14991972/BARRIER.java.patch
> here's the patch to fix this . However, we cannot see an easy way to fix 2
> STATE_REQ messages coming right after the other. They will both enter the
> in_flight_threads set and only one will come back down to lock the barrier
> and will wait forever for the other one to leave in_flight_threads. If we
> let the 2nd come down too, it may come back up before the in_flight_threads
> is clear since all it does is see that the barrer is closed and returns.
> Although we may have fixed part of the deadlock we saw, we are looking into
> switching to the FLUSH protocol instead because of the 2 STATE_REQ issue.
> Just curious as to other's feedback about this issue and whether more folks
> are using FLUSH or BARRIER?
> Thanks much for an [otherwise] great code stack. We are excited to be using
> it in our distributed database system project.
> gray
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list