[jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103

Fri Jun 15 08:09:35 EDT 2007

Jason T. Greene wrote:
> That should be ok though because the CCL will still timeout on that
> lock. The original problem was that the same thread with the CCL lock
> was blocking on an FC lock so that the CCL lock would never be released
> (since the FC lock was higher in the stack).

See my previous reply. Yes, it blocked on FC.down() because it didn't 
receive credits in up(). But up() wasn't called because there was a 
replication message ahead of it in the queue that blocked on the FQN 
held by the CCL.

So to tackle this, my suggestion were, in this order:
#1 Don't hold a lock while making a synchronous cluster method call. 
That's a big no no, especially in pre-2.5 releases. We had lots of bugs 
in the clustering code due to such code. Then Brian cleaned up all of 
it... :-)
#2 The timeout mechanism in JGroups which uses threads. Ugly, and a 
hack, and only needed for 2.4. As I argued, this will avoid the 
deadlock, but it will constantly time out (assuming some traffic).

The root cause of this is #1

-- 
Bela Ban
Lead JGroups / JBoss Clustering team
JBoss - a division of Red Hat