Re: [jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103

Tuesday, 19 June 2007


+1
On 19 Jun 2007, at 01:14, Jason T. Greene wrote:

...
 Simply awesome!

 On Tue, 2007-06-19 at 00:08 +0100, Manik Surtani wrote:
> On 18 Jun 2007, at 11:48, Manik Surtani wrote:
>
>>
>> On 15 Jun 2007, at 13:09, Bela Ban wrote:
>>
>>>
>>>
>>> Jason T. Greene wrote:
>>>> That should be ok though because the CCL will still timeout on  
>>>> that
>>>> lock. The original problem was that the same thread with the CCL
>>>> lock
>>>> was blocking on an FC lock so that the CCL lock would never be
>>>> released
>>>> (since the FC lock was higher in the stack).
>>>
>>> See my previous reply. Yes, it blocked on FC.down() because it
>>> didn't receive credits in up(). But up() wasn't called because
>>> there was a replication message ahead of it in the queue that
>>> blocked on the FQN held by the CCL.
>>>
>>> So to tackle this, my suggestion were, in this order:
>>> #1 Don't hold a lock while making a synchronous cluster method
>>> call. That's a big no no, especially in pre-2.5 releases. We had
>>> lots of bugs in the clustering code due to such code. Then Brian
>>> cleaned up all of it... :-)
>>> #2 The timeout mechanism in JGroups which uses threads. Ugly, and
>>> a hack, and only needed for 2.4. As I argued, this will avoid the
>>> deadlock, but it will constantly time out (assuming some traffic).
>>>
>>> The root cause of this is #1
>>
>> Let me look into why we had #1 anyway.  Originally the 1.2.x
>> codebase used a synchronized block on the CacheLoaderInterceptor
>> for this which meant that only one thread could pass through this
>> interceptor at any given time.  I changed this to lock on the Fqn
>> in question so at least if the Fqns didn't overlap multiple threads
>> could go thru this interceptor.
>>
>> The reason behind it seems to be so that the CacheLoader impl does
>> not have to deal with concurrent calls on the same node, but
>> thinking about it, I feel this is something that should be handled
>> in each CacheLoader impl, which should be thread safe.
>
> Ok, I spent a bit of time investigating this, and the original sync
> blocks in the CacheLoaderInterceptor, superceded by the
> synchronization on an Fqn in subsequent versions are all hacks to get
> around the fact that some cache loader implementations themselves
> aren't thread safe.
>
> I wrote a test to check the thread safety of cache loader impls, and
> most of what we have fail.  I patched the DummyInMemoryCacheLoader to
> be thread safe (a lot of synchronization within the impl) and removed
> the unnecessary locks in the interceptors and things work fine.  I
> tried this with some of the few thread-safe cache loader impls we
> have (BDBJE, for example) and this worked well as well.
>
> So what I propose is this:
>
> 1) Patch CLI to NOT use the Fqn locks (in BaseCacheLoaderInterceptor)
> 2) Make sure we acquire locks (lock() call up the interceptor stack)
> *before* attempting to load node, but after creating temp node
> 3) Make sure cache loader impls are thread safe
> 4) Add thread safety test to CacheLoaderTestBase
> 5) Patch loaders that aren't threadsafe and fail test in (4) -
> FileCacheLoader, JDBCCacheLoader, etc.
>
> I'm going to do this on my local checkout of HEAD and run the
> regression tests overnight.  If this works, I suggest backporting
> this to 1.4.x and regressing there.
>
> WDYT?
>
> Cheers,
> --
> Manik Surtani
>
> Lead, JBoss Cache
> JBoss, a division of Red Hat
>
>
>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
 -- 
 Jason T. Greene
 Lead, POJO Cache
 JBoss, a division of Red Hat

 _______________________________________________
 jbosscache-dev mailing list
 jbosscache-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/jbosscache-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103