[jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103

Bela Ban bela at jboss.com
Fri Jun 15 03:09:25 EDT 2007

I looked at 1103 in a bit more detail and concluded that the change in 
JGroups (http://jira.jboss.com/jira/browse/JGRP-533) would not help. The 
underlying issue is that (a) 2.4.1 has a single incoming request queue 
and (b) the ClusteredCacheLoader holds a lock while making a 
cluster-wide call. Let's look at an example:

   1. The CCL acquires a lock on Fqn-A and makes a cluster-wide call
   2. We get a replication message for A (RM-A), so someone made an
      update to A and is now trying to commit the change. RM-A tries to
      acquire the lock on A, but is blocked because the CCL holds it.
   3. Now a result for the CCL call arrives. It is not processed (single
      queue) until RM-A gets processed. However, that's not the case
      until the CCL call completes. In this case, the only way for the
      CCL call to complete is via a timeout, as it will never get its

So even if I implemented 533, it wouldn't help, as the interleaving 
between CCL calls and RM messages for the same FQNs would lead to timeouts.

Now, a possible solution to 1103 is that we make the CCL call *without* 
holding a lock. When we get the result(s), only *then* do we acquire a 
lock and update the FQN. We also need to check whether FQN-A was updated 
in the mean time and then decide which value to return (the value set by 
the RM or the one gotten from the CCL call).


Bela Ban
Lead JGroups / JBoss Clustering team
JBoss - a division of Red Hat

More information about the jbosscache-dev mailing list