[jbosscache-dev] GravitateDataCommand finds invalid "phantom" nodes

Manik Surtani manik at jboss.org
Fri Aug 7 05:18:24 EDT 2009


Thanks for the patch; that makes a lot of sense.  Go ahead and check  
it in to trunk.

Regarding the creation of nodes for deletion, well that's pessimistic  
locking for you.  Unfortunately the lock is encapsulated in the node  
so you can't acquire the lock until the node is created.  (Which is  
partially why the pessimistic lock design sucks).

This is created in PessimisticNodeBasedLockManager (line 145).  At  
this stage we already know if we are just creating a dummy node for  
deletion, so perhaps we could pass this as a flag into the constructor  
of a PessimisticUnversionedNode, and this could acquire the lock on  
creation.  Hmm, need to think about this.

Cheers
Manik

On 6 Aug 2009, at 18:03, Brian Stansberry wrote:

> More fun with buddy replication. :-)
>
> Saw an error on 1 of our failover tests where,
>
> 1) Node D had left the group, so lots of gravitation was going on.
> 2) Various nodes were sending DataGravitationCleanupCommands to the
> cluster for /BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx. Result  
> is
> all nodes in the cluster are trying to remove various
> /BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx nodes. On node A  
> those
> nodes don't exist, so PessimisticLockInterceptor.handleRemoveCommand  
> is
> adding them and removing them.
> 3) Concurrent with #2, a GravitateDataCommand for
> /JSESSION/st_localhost/123 comes in to node A. Session 123 was never
> stored on node A, so this should result in a cache miss. But what
> happened once was:
>
> [JBoss] 16:46:52,961 TRACE
> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
> (Incoming-13,10.34.32.153:14736) Problems invoking command.
> [JBoss] org.jboss.cache.NodeNotValidException: Node
> /_BUDDY_BACKUP_/10.34.32.156_48822:DEAD/1/JSESSION/st_localhost/ 
> UvzutZkoESBMRSnjv0eTRA__
> is not valid.  Perhaps it has been moved or removed.
> [JBoss] 	at
> org 
> .jboss 
> .cache 
> .invocation 
> .NodeInvocationDelegate.assertValid(NodeInvocationDelegate.java:527)
> [JBoss] 	at
> org 
> .jboss 
> .cache 
> .invocation 
> .NodeInvocationDelegate.getChildrenNames(NodeInvocationDelegate.java: 
> 292)
> [JBoss] 	at
> org 
> .jboss 
> .cache 
> .commands 
> .read.GravitateDataCommand.perform(GravitateDataCommand.java:176)
> ...
>
> It seems the command is seeing a non-existent node. Yep; looking at  
> the
> logs it's clear the above GravitateDataCommand was executed  
> concurrently
> with another DataGravitationCleanupCommand for the same session. (I  
> need
> to investigate why that happened.)
>
> Below is a possible patch to work around the issue.  This points to a
> more general locking problem though -- should these "phantom nodes"
> created for removal be visible to other threads? Shouldn't there be  
> a WL
> on them from the moment they are created until after they are removed?
>
> Hehe, answered my own question by writing it. The node is created by
> PessimisticNodeBasedLockManager and then locked. There's a gap in
> between where another thread could get a ref to it.
>
> Anyway, the patch:
>
> ### Eclipse Workspace Patch 1.0
> #P jbosscache-core
> Index: src/main/java/org/jboss/cache/commands/read/ 
> GravitateDataCommand.java
> ===================================================================
> ---
> src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
> (revision 8163)
> +++
> src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
> (working copy)
> @@ -29,6 +29,7 @@
>  import org.jboss.cache.InternalNode;
>  import org.jboss.cache.InvocationContext;
>  import org.jboss.cache.Node;
> +import org.jboss.cache.NodeNotValidException;
>  import org.jboss.cache.NodeSPI;
>  import org.jboss.cache.buddyreplication.BuddyFqnTransformer;
>  import org.jboss.cache.buddyreplication.BuddyManager;
> @@ -171,9 +172,18 @@
>           else
>           {
>              // make sure we LOAD data for this node!!
> -            actualNode.getData();
> -            // and children!
> -            actualNode.getChildrenNames();
> +            try
> +            {
> +               actualNode.getData();
> +               // and children!
> +               actualNode.getChildrenNames();
> +            }
> +            catch (NodeNotValidException e)
> +            {
> +               if (trace)
> +                  log.trace("Found node " + actualNode.getFqn() + "  
> but
> it is not valid. Returning 'no data found'", e);
> +               return GravitateResult.noDataFound();
> +            }
>           }
>
>           if (backupNodeFqn == null && searchSubtrees)
>
>
> -- 
> Brian Stansberry
> Lead, AS Clustering
> JBoss by Red Hat
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org






More information about the jbosscache-dev mailing list