[jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103

Mon Jun 18 06:48:39 EDT 2007

On 15 Jun 2007, at 13:09, Bela Ban wrote:

>
>
> Jason T. Greene wrote:
>> That should be ok though because the CCL will still timeout on that
>> lock. The original problem was that the same thread with the CCL lock
>> was blocking on an FC lock so that the CCL lock would never be  
>> released
>> (since the FC lock was higher in the stack).
>
> See my previous reply. Yes, it blocked on FC.down() because it  
> didn't receive credits in up(). But up() wasn't called because  
> there was a replication message ahead of it in the queue that  
> blocked on the FQN held by the CCL.
>
> So to tackle this, my suggestion were, in this order:
> #1 Don't hold a lock while making a synchronous cluster method  
> call. That's a big no no, especially in pre-2.5 releases. We had  
> lots of bugs in the clustering code due to such code. Then Brian  
> cleaned up all of it... :-)
> #2 The timeout mechanism in JGroups which uses threads. Ugly, and a  
> hack, and only needed for 2.4. As I argued, this will avoid the  
> deadlock, but it will constantly time out (assuming some traffic).
>
> The root cause of this is #1

Let me look into why we had #1 anyway.  Originally the 1.2.x codebase  
used a synchronized block on the CacheLoaderInterceptor for this  
which meant that only one thread could pass through this interceptor  
at any given time.  I changed this to lock on the Fqn in question so  
at least if the Fqns didn't overlap multiple threads could go thru  
this interceptor.

The reason behind it seems to be so that the CacheLoader impl does  
not have to deal with concurrent calls on the same node, but thinking  
about it, I feel this is something that should be handled in each  
CacheLoader impl, which should be thread safe.