[jbosscache-dev] GravitateDataCommand finds invalid "phantom" nodes

Manik Surtani manik at jboss.org
Fri Aug 7 05:28:12 EDT 2009


On 7 Aug 2009, at 10:18, Manik Surtani wrote:

> Thanks for the patch; that makes a lot of sense.  Go ahead and check
> it in to trunk.

Never mind, I've added this myself so that I can put trunk through a  
test run and cut 3.2.0.BETA1.

Cheers
Manik

>
> Regarding the creation of nodes for deletion, well that's pessimistic
> locking for you.  Unfortunately the lock is encapsulated in the node
> so you can't acquire the lock until the node is created.  (Which is
> partially why the pessimistic lock design sucks).
>
> This is created in PessimisticNodeBasedLockManager (line 145).  At
> this stage we already know if we are just creating a dummy node for
> deletion, so perhaps we could pass this as a flag into the constructor
> of a PessimisticUnversionedNode, and this could acquire the lock on
> creation.  Hmm, need to think about this.
>
> Cheers
> Manik
>
> On 6 Aug 2009, at 18:03, Brian Stansberry wrote:
>
>> More fun with buddy replication. :-)
>>
>> Saw an error on 1 of our failover tests where,
>>
>> 1) Node D had left the group, so lots of gravitation was going on.
>> 2) Various nodes were sending DataGravitationCleanupCommands to the
>> cluster for /BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx. Result
>> is
>> all nodes in the cluster are trying to remove various
>> /BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx nodes. On node A
>> those
>> nodes don't exist, so PessimisticLockInterceptor.handleRemoveCommand
>> is
>> adding them and removing them.
>> 3) Concurrent with #2, a GravitateDataCommand for
>> /JSESSION/st_localhost/123 comes in to node A. Session 123 was never
>> stored on node A, so this should result in a cache miss. But what
>> happened once was:
>>
>> [JBoss] 16:46:52,961 TRACE
>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>> (Incoming-13,10.34.32.153:14736) Problems invoking command.
>> [JBoss] org.jboss.cache.NodeNotValidException: Node
>> /_BUDDY_BACKUP_/10.34.32.156_48822:DEAD/1/JSESSION/st_localhost/
>> UvzutZkoESBMRSnjv0eTRA__
>> is not valid.  Perhaps it has been moved or removed.
>> [JBoss] 	at
>> org
>> .jboss
>> .cache
>> .invocation
>> .NodeInvocationDelegate.assertValid(NodeInvocationDelegate.java:527)
>> [JBoss] 	at
>> org
>> .jboss
>> .cache
>> .invocation
>> .NodeInvocationDelegate.getChildrenNames(NodeInvocationDelegate.java:
>> 292)
>> [JBoss] 	at
>> org
>> .jboss
>> .cache
>> .commands
>> .read.GravitateDataCommand.perform(GravitateDataCommand.java:176)
>> ...
>>
>> It seems the command is seeing a non-existent node. Yep; looking at
>> the
>> logs it's clear the above GravitateDataCommand was executed
>> concurrently
>> with another DataGravitationCleanupCommand for the same session. (I
>> need
>> to investigate why that happened.)
>>
>> Below is a possible patch to work around the issue.  This points to a
>> more general locking problem though -- should these "phantom nodes"
>> created for removal be visible to other threads? Shouldn't there be
>> a WL
>> on them from the moment they are created until after they are  
>> removed?
>>
>> Hehe, answered my own question by writing it. The node is created by
>> PessimisticNodeBasedLockManager and then locked. There's a gap in
>> between where another thread could get a ref to it.
>>
>> Anyway, the patch:
>>
>> ### Eclipse Workspace Patch 1.0
>> #P jbosscache-core
>> Index: src/main/java/org/jboss/cache/commands/read/
>> GravitateDataCommand.java
>> ===================================================================
>> ---
>> src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
>> (revision 8163)
>> +++
>> src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
>> (working copy)
>> @@ -29,6 +29,7 @@
>> import org.jboss.cache.InternalNode;
>> import org.jboss.cache.InvocationContext;
>> import org.jboss.cache.Node;
>> +import org.jboss.cache.NodeNotValidException;
>> import org.jboss.cache.NodeSPI;
>> import org.jboss.cache.buddyreplication.BuddyFqnTransformer;
>> import org.jboss.cache.buddyreplication.BuddyManager;
>> @@ -171,9 +172,18 @@
>>          else
>>          {
>>             // make sure we LOAD data for this node!!
>> -            actualNode.getData();
>> -            // and children!
>> -            actualNode.getChildrenNames();
>> +            try
>> +            {
>> +               actualNode.getData();
>> +               // and children!
>> +               actualNode.getChildrenNames();
>> +            }
>> +            catch (NodeNotValidException e)
>> +            {
>> +               if (trace)
>> +                  log.trace("Found node " + actualNode.getFqn() + "
>> but
>> it is not valid. Returning 'no data found'", e);
>> +               return GravitateResult.noDataFound();
>> +            }
>>          }
>>
>>          if (backupNodeFqn == null && searchSubtrees)
>>
>>
>> -- 
>> Brian Stansberry
>> Lead, AS Clustering
>> JBoss by Red Hat
>> _______________________________________________
>> jbosscache-dev mailing list
>> jbosscache-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org






More information about the jbosscache-dev mailing list