Re: [jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103

Friday, 15 June 2007

Yes, it would timeout, but that is better than the deadlock that  
currently occurs.  IMO I think the timeout is a valid response to  
such a call.  If the CCL cannot complete a remote call because of a  
remote lock, it should timeout.  And when it does, it releases the  
lock on Fqn-A and the update originating remotely can proceed.

The problem with making the CCL call without a lock is that this  
exposes concurrent loading and overwriting for all cache loaders (the  
CCL is treated as a simple cache loader impl by the  
CacheLoaderInterceptor).  This also has implications with race  
conditions on eviction (Without locks in the cache loader  
interceptor, the following could happen: Thread-1 does a get(), goes  
through the cache loader interceptor, sees the node requested in  
memory and does not load.  Eviction-thread gets a WL on the same node  
and evicts it.  Thread-1 now gets to the PessimisticLockInterceptor,  
cannot find the node, and since this is a get() call, doesn't create  
the node but returns a null)

On 15 Jun 2007, at 08:09, Bela Ban wrote:

...
 I looked at 1103 in a bit more detail and concluded that the change 

 in JGroups (http://jira.jboss.com/jira/browse/JGRP-533) would not  
 help. The underlying issue is that (a) 2.4.1 has a single incoming  
 request queue and (b) the ClusteredCacheLoader holds a lock while  
 making a cluster-wide call. Let's look at an example:

   1. The CCL acquires a lock on Fqn-A and makes a cluster-wide call
   2. We get a replication message for A (RM-A), so someone made an
      update to A and is now trying to commit the change. RM-A tries to
      acquire the lock on A, but is blocked because the CCL holds it.
   3. Now a result for the CCL call arrives. It is not processed  
 (single
      queue) until RM-A gets processed. However, that's not the case
      until the CCL call completes. In this case, the only way for the
      CCL call to complete is via a timeout, as it will never get its
      results.

 So even if I implemented 533, it wouldn't help, as the interleaving  
 between CCL calls and RM messages for the same FQNs would lead to  
 timeouts.

 Now, a possible solution to 1103 is that we make the CCL call  
 *without* holding a lock. When we get the result(s), only *then* do  
 we acquire a lock and update the FQN. We also need to check whether  
 FQN-A was updated in the mean time and then decide which value to  
 return (the value set by the RM or the one gotten from the CCL call).

 WDYT ?

 -- 
 Bela Ban
 Lead JGroups / JBoss Clustering team
 JBoss - a division of Red Hat 
--
Manik Surtani

Lead, JBoss Cache
JBoss, a division of Red Hat

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [jbosscache-dev] ClusteredCacheLoader deadlocks and JBCCACHE-1103