[jbosscache-dev] LockParentForChildInsertRemove and PessimisticLocking

Brian Stansberry brian.stansberry at redhat.com
Wed Aug 5 23:25:06 EDT 2009


Patch attached to the JIRA. I was wrong in thinking I could adapt DGTest 
to expose the issue. But anyway the testsuite passes with the patch.

Brian Stansberry wrote:
> I think I may have the WL figured out. Maybe. :)
> 
> Key thing is DataGravitationCleanupCommand is sent to all nodes, not 
> just buddies. Result is caches that never stored the fqn will be 
> attempting to remove a node that doesn't exist.
> 
> In this case PessimisticNodeLockManager.lock(...) will attempt to create 
> the node to be removed as well as all nodes leading to it. It will 
> acquire a WL on all nodes it creates. It sets a local variable "created" 
> to "true" when it creates a node and once that variable is "true" it 
> will try to acquire a WL on any node it locks.
> 
> I suspect the problem comes here:
> 
> // make sure the lock we acquired isn't on a deleted node/is an orphan!!
> // look into invalidated nodes as well
> NodeSPI repeek = dataContainer.peek(currentNodeFqn, true, true);
> if (currentNode != repeek)
> {
>    // A bunch of "reset" logic
> }
> 
> The "reset logic" involves resetting variables so the loop will start 
> over higher in the tree. But it doesn't "reset" the "created" flag, so 
> when it starts over it starts trying to acquire WLs on nodes like / and 
> /BUDDY_MANAGER where it only needs a RL.
> 
> This explains why the thread is trying to "upgrade". On an earlier pass 
> through the loop it got a RL, second pass it's trying for a WL.
> 
> I'm not exactly sure why the if (currentNode != repeek) test fails, but 
> it's happening when a bunch of threads are all doing data gravitation 
> cleanups. So there's a lot of concurrent creation of 
> nodes/parents/grandparents just so they can be deleted.
> 
> I'll see if I can patch this. I won't commit any fix; will just attach 
> to the JIRA. I played with your DGTest and couldn't produce a failure, 
> but I instrumented the code with logging and for sure see WL's being 
> acquired. Now that I understand this better I can probably make a 
> variant of DGTest that fails.
> 
> 
> Brian Stansberry wrote:
>> Just in case it matters, suggest you use a deeper Fqn:
>>
>> for (int i = 0; i < DATA; i++) c1.put("/JSESSION/st_localhost/" + i, 
>> "k", "v");
>>
>> Perhaps add a listener to cause a thread to block on reading 
>> /JSESSION/st_localhost/0 ?  Confirm that the others proceed?  That might 
>> help expose if/why a WL is being acquired.
>>
>> Manik Surtani wrote:
>>> On 5 Aug 2009, at 16:42, Brian Stansberry wrote:
>>>
>>>> Sorry, nope. :(
>>>>
>>>> I might (30% chance) be able to create one later today.
>>> I've started on a test (attached to JIRA) but no luck in reproducing 
>>> this so far.  Care to cast your eye over it to validate the use case?
>>>
>>> Cheers
>>> Manik
>>>
>>>> Manik Surtani wrote:
>>>>> You wouldn't have this encapsulated as a unit test, would you?
>>>>> On 5 Aug 2009, at 14:42, Brian Stansberry wrote:
>>>>>> Paul and I have been poking through code trying to diagnose the issue
>>>>>> that led to this thread for quite a while yesterday. Time to throw it
>>>>>> out there and see if anyone hopefully spots something:
>>>>>>
>>>>>> This is web session replication with buddy replication. The
>>>>>> "LockParentForChildInsertRemove" property isn't configured on the 
>>>>>> cache,
>>>>>> so the default "false" should be in effect. A thread trying to 
>>>>>> execute a
>>>>>> remote DataGravitationCleanupCommand on a session node fails upgrading
>>>>>> to a write lock on fqn /BUDDY_MANAGER.[1] We can't understand why it
>>>>>> would want a WL.
>>>>>>
>>>>>> Looking at the PessimisticNodeBasedLockManager code, the only 
>>>>>> reasons I
>>>>>> see why a WL would be requested here are:
>>>>>>
>>>>>> 1) The invocation created the previously non-existent /BUDDY_MANAGER
>>>>>> node in order to execute the remove of the
>>>>>> great-great-great-great-grandchild. Highly unlikely in general and
>>>>>> logging shows another thread successfully doing a put on a different
>>>>>> great-great-great-great-grandchild a millisecond earlier. (Also other
>>>>>> test runs w/ less verbose logging showed failures getting a WL on 
>>>>>> fqn /
>>>>>> which of course exists).
>>>>>>
>>>>>> 2) In call to PNBLM.writeLockNeeded(...)
>>>>>> currentNode.isLockForChildInsertRemove() is returning "true". Which
>>>>>> AFAICT it shouldn't be.
>>>>>>
>>>>>> [1] The logging from the thread:
>>>>>>
>>>>>> [JBoss] 20:29:27,488 TRACE
>>>>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>>>>> (Incoming-5,10.34.32.154:46064) Executing command:
>>>>>> DataGravitationCleanupCommand{fqn=/JSESSION/st_localhost/wUaU4FhUzWcuiDEjkJzDiQ__, 
>>>>>>
>>>>>> backup=/_BUDDY_BACKUP_/10.34.32.155_41754:DEAD/1/JSESSION/st_localhost/wUaU4FhUzWcuiDEjkJzDiQ__} 
>>>>>>
>>>>>> [sender=10.34.32.153:24595]
>>>>>> [JBoss] 20:29:27,489 TRACE
>>>>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>>>>> (Incoming-5,10.34.32.154:46064) This is a non-visitable command - so
>>>>>> performing directly and not via the invoker.
>>>>>> [JBoss] 20:29:42,497 ERROR
>>>>>> [org.jboss.cache.lock.ReadWriteLockWithUpgrade]
>>>>>> (Incoming-5,10.34.32.154:46064) upgradeLock(): failed
>>>>>> [JBoss] 20:29:42,500 TRACE
>>>>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>>>>> (Incoming-5,10.34.32.154:46064) Problems invoking command.
>>>>>> [JBoss] org.jboss.cache.lock.UpgradeException: upgrade lock for
>>>>>> /_BUDDY_BACKUP_ could not be acquired after 15000 ms. Lock map 
>>>>>> ownership
>>>>>> Read lock owners: [GlobalTransaction:&lt;10.34.32.153:24595>:70,
>>>>>> GlobalTransaction:&lt;10.34.32.153:24595>:69]
>>>>>> [JBoss] Write lock owner: null
>>>>>> [JBoss]  (caller=Thread[Incoming-5,10.34.32.154:46064,5,Thread Pools],
>>>>>> lock info: read owners=[GlobalTransaction:&lt;10.34.32.153:24595>:70,
>>>>>> GlobalTransaction:&lt;10.34.32.153:24595>:69] (activeReaders=2,
>>>>>> activeWriter=null, waitingReaders=1, waitingWriters=0, 
>>>>>> waitingUpgrader=0))
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:227) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:176) 
>>>>>>
>>>>>> [JBoss]     at 
>>>>>> org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:503)
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.lock.PessimisticNodeBasedLockManager.acquireNodeLock(PessimisticNodeBasedLockManager.java:260) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.lock.PessimisticNodeBasedLockManager.lock(PessimisticNodeBasedLockManager.java:169) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.lock.PessimisticNodeBasedLockManager.lockPessimistically(PessimisticNodeBasedLockManager.java:97) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.PessimisticLockInterceptor.handleRemoveNodeCommand(PessimisticLockInterceptor.java:265) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.PrePostProcessingCommandInterceptor.visitRemoveNodeCommand(PrePostProcessingCommandInterceptor.java:126) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:148) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.ReplicationInterceptor.visitRemoveNodeCommand(ReplicationInterceptor.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.TxInterceptor.attachGtxAndPassUpChain(TxInterceptor.java:301) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.TxInterceptor.handleDefault(TxInterceptor.java:283) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.InvocationContextInterceptor.visitRemoveNodeCommand(InvocationContextInterceptor.java:88) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.interceptors.InterceptorChain.invoke(InterceptorChain.java:287) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.remote.DataGravitationCleanupCommand.executeRemove(DataGravitationCleanupCommand.java:151) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.commands.remote.DataGravitationCleanupCommand.perform(DataGravitationCleanupCommand.java:105) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.marshall.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:305) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jboss.cache.marshall.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:242) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775) 
>>>>>>
>>>>>> [JBoss]     at org.jgroups.JChannel.up(JChannel.java:1339)
>>>>>> [JBoss]     at 
>>>>>> org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
>>>>>> [JBoss]     at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:432)
>>>>>> [JBoss]     at
>>>>>> org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:144) 
>>>>>>
>>>>>> [JBoss]     at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
>>>>>> [JBoss]     at org.jgroups.protocols.FC.up(FC.java:473)
>>>>>> [JBoss]     at org.jgroups.protocols.pbcast.GMS.up(GMS.java:824)
>>>>>> [JBoss]     at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
>>>>>> [JBoss]     at
>>>>>> org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:616)
>>>>>> [JBoss]     at org.jgroups.protocols.UNICAST.up(UNICAST.java:282)
>>>>>> [JBoss]     at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:758)
>>>>>> [JBoss]     at 
>>>>>> org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
>>>>>> [JBoss]     at org.jgroups.protocols.FD.up(FD.java:284)
>>>>>> [JBoss]     at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
>>>>>> [JBoss]     at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
>>>>>> [JBoss]     at org.jgroups.protocols.Discovery.up(Discovery.java:264)
>>>>>> [JBoss]     at org.jgroups.protocols.PING.up(PING.java:273)
>>>>>> [JBoss]     at 
>>>>>> org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2327)
>>>>>> [JBoss]     at org.jgroups.protocols.TP.passMessageUp(TP.java:1261)
>>>>>> [JBoss]     at org.jgroups.protocols.TP.access$100(TP.java:49)
>>>>>> [JBoss]     at
>>>>>> org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1838)
>>>>>> [JBoss]     at 
>>>>>> org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1817)
>>>>>> [JBoss]     at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
>>>>>>
>>>>>> [JBoss]     at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
>>>>>>
>>>>>> [JBoss]     at java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>> Manik Surtani wrote:
>>>>>>> On 4 Aug 2009, at 21:28, Brian Stansberry wrote:
>>>>>>>
>>>>>>>> Continuing dialogue with myself. :)
>>>>>>>>
>>>>>>>> I've found the logic in PessimisticNodeBasedLockManager where the
>>>>>>>> LockParentForChildInsertRemove configuration is meant to be coming
>>>>>>>> through via the Node.
>>>>>>> Yes, this is because you can override this on a per-node basis.
>>>>>>>
>>>>>>>> In our testing we're seeing cases where write
>>>>>>>> locks are being acquired on parent nodes, which isn't the intent. 
>>>>>>>> I'll
>>>>>>>> dig further.
>>>>>>>>
>>>>>>>> Brian Stansberry wrote:
>>>>>>>>> https://jira.jboss.org/jira/browse/JBCACHE-1527
>>>>>>>>>
>>>>>>>>> Brian Stansberry wrote:
>>>>>>>>>> From looking at the JBC 3 code, it seems the
>>>>>>>>>> LockParentForChildInsertRemove configuration is no longer 
>>>>>>>>>> respected for
>>>>>>>>>> pessimistic locking.  I can't trace any path from the property in
>>>>>>>>>> Configuration to code that uses it.
>>>>>>>>>> PessimisticLockInterceptor.handlePutCommand, handleMoveCommand and
>>>>>>>>>> handleRemoveNodeCommand all always tell the lock manager to lock
>>>>>>>>>> parents. handleEvictFqnCommand always tells the lock manager 
>>>>>>>>>> not to
>>>>>>>>>> lock
>>>>>>>>>> parents.
>>>>>>>>>>
>>>>>>>>>> This is causing failures in buddy replication testing when nodes
>>>>>>>>>> join/leave clusters under load. There's a lot of data 
>>>>>>>>>> gravitation plus
>>>>>>>>>> stuff like migrating defunct backup trees to "DEAD" regions. 
>>>>>>>>>> Too much
>>>>>>>>>> contention for parent level locks.
>>>>>>>>>>
>>>>>>>>>> Plus locking on the structural parent to add/remove session 
>>>>>>>>>> nodes will
>>>>>>>>>> suck for the session caching use case.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Brian Stansberry
>>>>>>>> Lead, AS Clustering
>>>>>>>> JBoss by Red Hat
>>>>>>>> _______________________________________________
>>>>>>>> jbosscache-dev mailing list
>>>>>>>> jbosscache-dev at lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>>>>>>> -- 
>>>>>>> Manik Surtani
>>>>>>> manik at jboss.org
>>>>>>> Lead, Infinispan
>>>>>>> Lead, JBoss Cache
>>>>>>> http://www.infinispan.org
>>>>>>> http://www.jbosscache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Brian Stansberry
>>>>>> Lead, AS Clustering
>>>>>> JBoss by Red Hat
>>>>>> _______________________________________________
>>>>>> jbosscache-dev mailing list
>>>>>> jbosscache-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>>>>> -- 
>>>>> Manik Surtani
>>>>> manik at jboss.org
>>>>> Lead, Infinispan
>>>>> Lead, JBoss Cache
>>>>> http://www.infinispan.org
>>>>> http://www.jbosscache.org
>>>>
>>>> -- 
>>>> Brian Stansberry
>>>> Lead, AS Clustering
>>>> JBoss by Red Hat
>>> -- 
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>
> 
> 


-- 
Brian Stansberry
Lead, AS Clustering
JBoss by Red Hat


More information about the jbosscache-dev mailing list