[jbosscache-dev] Problems with node locking semantics in JBC1.4.1.GA

Sun Jan 28 17:41:43 EST 2007

Final installment in a conversation with myself ;)

The REPEATABLE_DATA_READ thing has a funny smell, but the ease with
which it was implemented makes me realize it's fairly easy to separate
decisions about the data map from decisions about the child map. Perhaps
a cache config flag, separate from IsolationLevel, could control the
same behavior.  For want of a better name,

<attribute name="TreatChildInsertRemoveAsAWrite">true</attribute>

Even further, that could be treated as a cache wide default, used to set
a flag on each node. The PessimisticLockInterceptor could check that
flag to decide whether to get the WL.  What's nice about that is the
cache wide default could be true, but for certain nodes the application
could set it to false. E.g the root node for a webapp under which
sessions are stored, root node for an SFSB or entity under which
instances are stored, PojoCache's _JBossInternal_ node etc.  Simpler
alternative to lock striping on the child map.

Haven't thought about optimistic locking, as I'm not too familiar with
the details of handling child inserts/removes.

Brian Stansberry wrote:
> I played around with one way to deal with this.
> 
> Speaking in db isolation terms, let's say a
> non-repeatable-read means a tx reads a node's *data map*
> twice and gets different results. A phantom read means a tx
> reads a node's *child map* twice and gets different results.
> Conceptually, a JBC node represents both a "row" (data map)
> and a query result set (children map -- set of rows that
> share a common characteristic that the parent represents). (I
> know this analogy is inexact, but, ...).
> 
> In 1.4.0.SP1, REPEATABLE_READ meant you wouldn't get
> non-repeatable-reads, but could get phantom reads. With
> 1.4.1, you also don't get phantom reads (at least with
> respect to a node's immediate children). But, in many cases
> this behavior is unneeded and undesirable, and JBC no longer
> offers the old behavior.
> 
> I attempted to remedy this with a new IsolationLevel
> REPEATABLE_DATA_READ, which basically provides the 1.4.0
> semantics.  This was actually very easy to do.
> 
> 1) LockStrategyFactory creates the same LockStrategy for
> REPEATABLE_DATA_READ as for REPEATABLE_READ -- hence lock
> behavior at the node level is unaffected.
> 
> 2) PessimisticLockInterceptor uses the new level when
> deciding whether to WL a parent before adding/removing a
> node.  Manik nicely encapsulated that logic, so it  was
> simple to change.  Here's a diff to the head of
> Branch_JBossCache_1_4_0:
> 
> ### Eclipse Workspace Patch 1.0
> #P JBossCache
> Index:
> src/org/jboss/cache/interceptors/PessimisticLockInterceptor.java
> ===================================================================
> RCS file:
> /cvsroot/jboss/JBossCache/src/org/jboss/cache/interceptors/Pes
> simisticLockInterceptor.java,v 
> retrieving revision 1.20.2.8
> diff -u -r1.20.2.8 PessimisticLockInterceptor.java
> ---
> src/org/jboss/cache/interceptors/PessimisticLockIntercept
> or.java	14 Dec 2006 12:51:48 -0000	1.20.2.8
> +++
> src/org/jboss/cache/interceptors/PessimisticLockIntercept
> or.java	28 Jan 2007 21:08:55 -0000
> @@ -41,6 +41,8 @@
>  public class PessimisticLockInterceptor extends Interceptor  {
>     TransactionTable tx_table = null;
> +
> +   boolean writeLockOnChildInsertRemove = true;
> 
>     /**
>      * Map<Object, java.util.List>. Keys = threads, values =
> lists of locks held by that thread @@ -55,6 +57,7 @@
>        tx_table = cache.getTransactionTable();
>        lock_table = cache.getLockTable();
>        lock_acquisition_timeout = cache.getLockAcquisitionTimeout();
> +      writeLockOnChildInsertRemove =
> (cache.getIsolationLevelClass() !=
> + IsolationLevel.REPEATABLE_DATA_READ);
>     }
> 
> 
> @@ -323,12 +326,14 @@
> 
>     private boolean writeLockNeeded(int lock_type, int
> currentNodeIndex, int treeNodeSize, boolean
> isRemoveOperation, boolean createIfNotExists, Fqn targetFqn,
> Fqn currentFqn)
>     {
> -      if (isRemoveOperation && currentNodeIndex == treeNodeSize - 2)
> -         return true; // we're doing a remove and we've
> reached the PARENT node of the target to be removed. -
> -      if (!isTargetNode(currentNodeIndex, treeNodeSize) &&
> !cache.exists(new Fqn(currentFqn,
> targetFqn.get(currentNodeIndex + 1))))
> -         return createIfNotExists; // we're at a node in the
> tree, not yet at the target node, and we need to create the
> next node.  So we need a WL here.
> -
> +      if (writeLockOnChildInsertRemove)
> +      {
> +         if (isRemoveOperation && currentNodeIndex ==
> treeNodeSize - 2)
> +            return true; // we're doing a remove and we've
> reached the PARENT node of the target to be removed. +
> +         if (!isTargetNode(currentNodeIndex, treeNodeSize)
> && !cache.exists(new Fqn(currentFqn,
> targetFqn.get(currentNodeIndex + 1))))
> +            return createIfNotExists; // we're at a node in
> the tree, not yet at the target node, and we need to create
> the next node.  So we need a WL here.
> +      }
>        return lock_type == DataNode.LOCK_TYPE_WRITE &&
> isTargetNode(currentNodeIndex, treeNodeSize) &&
> (createIfNotExists || isRemoveOperation); //normal operation,
> write lock explicitly requested and this is the target to be
> written to.
>     }
> 
> 
> jbosscache-dev-bounces at lists.jboss.org wrote:
>> In 1.4.1.GA, we introduced node locking semantics that weren't there
>> in the previous few releases (and maybe never were there).  This is
>> causing significant integration issues.
>> 
>> Specifically, before inserting or removing a child node, we acquire a
>> WL on the parent.  Previously, only a RL was acquired.
>> 
>> I'm concerned about anomalies this might cause in apps that
>> intentionally or unintentionally depends on the old behavior.
>> For example, the AS http session repl code assumes it has freedom to
>> add/remove cache nodes that represent a session underneath the parent
>> that represents the webapp. Now these activities will block until any
>> activity on any other session is complete.  With FIELD granularity,
>> that's a big issue, as locks are held throughout the web request. It
>> will likely cause issues with other granularities as well. (We
>> already saw a minor issue in a support case when a customer tried to
>> upgrade 
>> 4.0.5 to 1.4.1).
>> 
>> This definitely breaks the Hibernate (and thus EJB3 entity bean
>> clustering) integration with 1.4.1.GA. Basically, assume there is an
>> entity Person, with a child collection "children". Hibernate caches
>> the entity, and later caches the child collection in a child node.
>> 
>> 1) Transaction tx = tm.begin();
>> 2) cache.get("/Person/Person#1/children", ITEM); // returns null --
>> not cached yet 
>> 
>> 3) read db to get the ids of the collection elements
>> 
>> 4) tx.suspend();
>> 5) cache.putFailFast("/Person/Person#1/children", ITEM, data); 6)
>> tx.resume(); 
>> 
>> This fails in step 5, because with the tx suspended, the
>> putFailFast() call cannot acquire the WL on /Person/Person#1 that it
>> needs to insert /Person/Person#1/children.
>> 
>> Hmm, an EJB3 unit test shows that problem occuring, but I assume even
>> simple entity caching will break:
>> 
>> 1) Transaction tx = tm.begin();
>> 2) cache.get("/Person/Person#2", ITEM); // returns null -- not
>> cached yet 
>> 
>> 3) read db to get the Person
>> 
>> 4) tx.suspend();
>> 5) cache.putFailFast("/Person/Person#1", ITEM, data); 6) tx.resume();
>> 
>> We need a quick workaround for this in the 1.4 branch, as it's a
>> blocker for EJB3 RC10 and AS 4.2.
>> 
>> Brian Stansberry
>> Lead, AS Clustering
>> JBoss, a division of Red Hat
>> Ph: 510-396-3864
>> skype: bstansberry
>> 
>> _______________________________________________
>> jbosscache-dev mailing list
>> jbosscache-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev