[jboss-jira] [JBoss JIRA] Updated: (JGRP-668) Deadlock condition in BARRIER
Gray Watson (JIRA)
jira-events at lists.jboss.org
Mon Jan 21 11:41:21 EST 2008
[ http://jira.jboss.com/jira/browse/JGRP-668?page=all ]
Gray Watson updated JGRP-668:
-----------------------------
Attachment: BARRIER.java.patch
Patch to BARRIER that handles the race condition between close and in_flight_threads but leaves the problem of 2 state transfer requests in flight at once.
> Deadlock condition in BARRIER
> -----------------------------
>
> Key: JGRP-668
> URL: http://jira.jboss.com/jira/browse/JGRP-668
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assigned To: Bela Ban
> Fix For: 2.7
>
> Attachments: BARRIER.java.patch, barrier_deadlock.txt
>
>
> Hey Bela et al:
> We've been fighting a lossy network (UDP receive errors) on a cluster of 50
> machines and managed to produce 2 coordinators who refused to MERGE3. A
> closer examination reviewed
> http://www.nabble.com/file/p14991972/barrier_deadlock.txt this stack trace
> which showed that there was one thread trying to satisfy a STATE_REQ msg
> blocked down in BARRIER.closeBarrier() waiting for the in_flight_threads to
> empty, another thread was trying to service another STATE_REQ and was
> blocked trying to lock the state_requesters table up in
> STATE_TRANSFER.handleStateReq(), and another 12 threads blocked waiting for
> the barrier to open in BARRIER.up().
> We quickly found that we had a deadlock condition in BARRIER that was
> problematic -- http://www.nabble.com/file/p14991972/BARRIER.java.patch
> here's the patch to fix this . However, we cannot see an easy way to fix 2
> STATE_REQ messages coming right after the other. They will both enter the
> in_flight_threads set and only one will come back down to lock the barrier
> and will wait forever for the other one to leave in_flight_threads. If we
> let the 2nd come down too, it may come back up before the in_flight_threads
> is clear since all it does is see that the barrer is closed and returns.
> Although we may have fixed part of the deadlock we saw, we are looking into
> switching to the FLUSH protocol instead because of the 2 STATE_REQ issue.
> Just curious as to other's feedback about this issue and whether more folks
> are using FLUSH or BARRIER?
> Thanks much for an [otherwise] great code stack. We are excited to be using
> it in our distributed database system project.
> gray
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list