Thread for discussion of
http://jira.jboss.com/jira/browse/JBCACHE-1251.
Some further analysis of the problem; I don't really know the proper fix as I'm
not familiar enough with all the subtle cases this code is handling.
Analyzing this in debugger, it seems the problem is in inconsistencies in handling
deleted/invalidated nodes in PLU.acquireLocksWithTimeout() and PLI.lock():
In acquireLocksWithTimeout():
| do
| {
| // this is an additional check to make sure we don't try for too long.
| if (!firstTry && System.currentTimeMillis() > cutoffTime)
| {
| throw new TimeoutException("Unable to acquire lock on Fqn " +
fqn + " after " + timeout + " millis");
| }
| created = lock(ctx, fqn, lockType, createIfNotExists, timeout,
acquireLockOnParent, reverseRemoveCheck);
| firstTry = false;
| }
| while (createIfNotExists && cache.peek(fqn, true) == null);// keep
trying until we have the lock (fixes concurrent remove())
|
The cache.peek() call in the while is really a convenience version of cache.peek(fqn,
true, false). This last "false" param means the loop will continue as long as
the peek only finds an "invalid" tombstone.
This is what is happening. Question is why a new "valid" node isn't created
in the lock() call.
Looking at lock, a couple things pop out:
1) Creating a new node will only happen if the current node doesn't exist. In this
case the current node does exist, it's just invalid. I don't see any logic in the
loop to handle this.
2) There's logic to try to detect deleted/orphan nodes that seems odd:
| // make sure the lock we acquired isn't on a deleted node/is an orphan!!
| // look into invalidated nodes as well
| NodeSPI repeek = cache.peek(currentNode.getFqn(), true, true);
| if (currentNode != repeek)
| {
| if (log.isTraceEnabled())
| log.trace("Was waiting for and obtained a lock on a node that
doesn't exist anymore! Attempting lock acquisition again.");
| // we have an orphan!! Lose the unnecessary lock and re-acquire the lock
(and potentially recreate the node).
|
Idea here seems to be to relook up the current node from the cache but ignore
deleted/invalid nodes. If the node we have in hand isn't the one the cache finds,
there's an issue that we deal with.
Shouldn the cache.peek call be cache.peek(currentNode.getFqn(), false, false); -- i.e.
ignore deleted/invalid?
Hmm, maybe not, I think I see the idea. You're *only* checking for nodes that are
"floating in space". Nodes that are still part of the main cache structure, no
matter what their status, are OK.
In that case, I think issue #1 is the problem -- there's no mechanism to create a new
node if the existing one is invalid, or to somehow "resurrect" the invalid one.
OT: on the repeek thing -- that seems like quite a bit of overhead since it's repeated
on every node from the root down -- for every call. Could that check be limited to the
last node in the hierarchy?
Alternative (which requires SPI change) is to add a boolean "orphan" flag to a
node, or to make AbstractNode.deleted an enum with "FALSE, DELETING, DELETED" or
something. Some way to change the state of the node itself when it's been cut loose
from the tree.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4115180#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...