[jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103
Manik Surtani
manik at jboss.org
Mon Jun 18 06:48:39 EDT 2007
On 15 Jun 2007, at 13:09, Bela Ban wrote:
>
>
> Jason T. Greene wrote:
>> That should be ok though because the CCL will still timeout on that
>> lock. The original problem was that the same thread with the CCL lock
>> was blocking on an FC lock so that the CCL lock would never be
>> released
>> (since the FC lock was higher in the stack).
>
> See my previous reply. Yes, it blocked on FC.down() because it
> didn't receive credits in up(). But up() wasn't called because
> there was a replication message ahead of it in the queue that
> blocked on the FQN held by the CCL.
>
> So to tackle this, my suggestion were, in this order:
> #1 Don't hold a lock while making a synchronous cluster method
> call. That's a big no no, especially in pre-2.5 releases. We had
> lots of bugs in the clustering code due to such code. Then Brian
> cleaned up all of it... :-)
> #2 The timeout mechanism in JGroups which uses threads. Ugly, and a
> hack, and only needed for 2.4. As I argued, this will avoid the
> deadlock, but it will constantly time out (assuming some traffic).
>
> The root cause of this is #1
Let me look into why we had #1 anyway. Originally the 1.2.x codebase
used a synchronized block on the CacheLoaderInterceptor for this
which meant that only one thread could pass through this interceptor
at any given time. I changed this to lock on the Fqn in question so
at least if the Fqns didn't overlap multiple threads could go thru
this interceptor.
The reason behind it seems to be so that the CacheLoader impl does
not have to deal with concurrent calls on the same node, but thinking
about it, I feel this is something that should be handled in each
CacheLoader impl, which should be thread safe.
More information about the jbosscache-dev
mailing list