Thanks for the patch; that makes a lot of sense. Go ahead and check
it in to trunk.
Regarding the creation of nodes for deletion, well that's pessimistic
locking for you. Unfortunately the lock is encapsulated in the node
so you can't acquire the lock until the node is created. (Which is
partially why the pessimistic lock design sucks).
This is created in PessimisticNodeBasedLockManager (line 145). At
this stage we already know if we are just creating a dummy node for
deletion, so perhaps we could pass this as a flag into the constructor
of a PessimisticUnversionedNode, and this could acquire the lock on
creation. Hmm, need to think about this.
Cheers
Manik
On 6 Aug 2009, at 18:03, Brian Stansberry wrote:
More fun with buddy replication. :-)
Saw an error on 1 of our failover tests where,
1) Node D had left the group, so lots of gravitation was going on.
2) Various nodes were sending DataGravitationCleanupCommands to the
cluster for /BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx. Result
is
all nodes in the cluster are trying to remove various
/BUDDY_BACKUP/D_DEAD/1/JSESSION/st_localhost/xxx nodes. On node A
those
nodes don't exist, so PessimisticLockInterceptor.handleRemoveCommand
is
adding them and removing them.
3) Concurrent with #2, a GravitateDataCommand for
/JSESSION/st_localhost/123 comes in to node A. Session 123 was never
stored on node A, so this should result in a cache miss. But what
happened once was:
[JBoss] 16:46:52,961 TRACE
[org.jboss.cache.marshall.CommandAwareRpcDispatcher]
(Incoming-13,10.34.32.153:14736) Problems invoking command.
[JBoss] org.jboss.cache.NodeNotValidException: Node
/_BUDDY_BACKUP_/10.34.32.156_48822:DEAD/1/JSESSION/st_localhost/
UvzutZkoESBMRSnjv0eTRA__
is not valid. Perhaps it has been moved or removed.
[JBoss] at
org
.jboss
.cache
.invocation
.NodeInvocationDelegate.assertValid(NodeInvocationDelegate.java:527)
[JBoss] at
org
.jboss
.cache
.invocation
.NodeInvocationDelegate.getChildrenNames(NodeInvocationDelegate.java:
292)
[JBoss] at
org
.jboss
.cache
.commands
.read.GravitateDataCommand.perform(GravitateDataCommand.java:176)
...
It seems the command is seeing a non-existent node. Yep; looking at
the
logs it's clear the above GravitateDataCommand was executed
concurrently
with another DataGravitationCleanupCommand for the same session. (I
need
to investigate why that happened.)
Below is a possible patch to work around the issue. This points to a
more general locking problem though -- should these "phantom nodes"
created for removal be visible to other threads? Shouldn't there be
a WL
on them from the moment they are created until after they are removed?
Hehe, answered my own question by writing it. The node is created by
PessimisticNodeBasedLockManager and then locked. There's a gap in
between where another thread could get a ref to it.
Anyway, the patch:
### Eclipse Workspace Patch 1.0
#P jbosscache-core
Index: src/main/java/org/jboss/cache/commands/read/
GravitateDataCommand.java
===================================================================
---
src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
(revision 8163)
+++
src/main/java/org/jboss/cache/commands/read/GravitateDataCommand.java
(working copy)
@@ -29,6 +29,7 @@
import org.jboss.cache.InternalNode;
import org.jboss.cache.InvocationContext;
import org.jboss.cache.Node;
+import org.jboss.cache.NodeNotValidException;
import org.jboss.cache.NodeSPI;
import org.jboss.cache.buddyreplication.BuddyFqnTransformer;
import org.jboss.cache.buddyreplication.BuddyManager;
@@ -171,9 +172,18 @@
else
{
// make sure we LOAD data for this node!!
- actualNode.getData();
- // and children!
- actualNode.getChildrenNames();
+ try
+ {
+ actualNode.getData();
+ // and children!
+ actualNode.getChildrenNames();
+ }
+ catch (NodeNotValidException e)
+ {
+ if (trace)
+ log.trace("Found node " + actualNode.getFqn() + "
but
it is not valid. Returning 'no data found'", e);
+ return GravitateResult.noDataFound();
+ }
}
if (backupNodeFqn == null && searchSubtrees)
--
Brian Stansberry
Lead, AS Clustering
JBoss by Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org