[jbosscache-dev] LockParentForChildInsertRemove and PessimisticLocking

Manik Surtani manik at jboss.org
Wed Aug 5 13:29:55 EDT 2009


On 5 Aug 2009, at 16:42, Brian Stansberry wrote:

> Sorry, nope. :(
>
> I might (30% chance) be able to create one later today.

I've started on a test (attached to JIRA) but no luck in reproducing  
this so far.  Care to cast your eye over it to validate the use case?

Cheers
Manik

>
> Manik Surtani wrote:
>> You wouldn't have this encapsulated as a unit test, would you?
>> On 5 Aug 2009, at 14:42, Brian Stansberry wrote:
>>> Paul and I have been poking through code trying to diagnose the  
>>> issue
>>> that led to this thread for quite a while yesterday. Time to throw  
>>> it
>>> out there and see if anyone hopefully spots something:
>>>
>>> This is web session replication with buddy replication. The
>>> "LockParentForChildInsertRemove" property isn't configured on the  
>>> cache,
>>> so the default "false" should be in effect. A thread trying to  
>>> execute a
>>> remote DataGravitationCleanupCommand on a session node fails  
>>> upgrading
>>> to a write lock on fqn /BUDDY_MANAGER.[1] We can't understand why it
>>> would want a WL.
>>>
>>> Looking at the PessimisticNodeBasedLockManager code, the only  
>>> reasons I
>>> see why a WL would be requested here are:
>>>
>>> 1) The invocation created the previously non-existent /BUDDY_MANAGER
>>> node in order to execute the remove of the
>>> great-great-great-great-grandchild. Highly unlikely in general and
>>> logging shows another thread successfully doing a put on a different
>>> great-great-great-great-grandchild a millisecond earlier. (Also  
>>> other
>>> test runs w/ less verbose logging showed failures getting a WL on  
>>> fqn /
>>> which of course exists).
>>>
>>> 2) In call to PNBLM.writeLockNeeded(...)
>>> currentNode.isLockForChildInsertRemove() is returning "true". Which
>>> AFAICT it shouldn't be.
>>>
>>> [1] The logging from the thread:
>>>
>>> [JBoss] 20:29:27,488 TRACE
>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>> (Incoming-5,10.34.32.154:46064) Executing command:
>>> DataGravitationCleanupCommand{fqn=/JSESSION/st_localhost/ 
>>> wUaU4FhUzWcuiDEjkJzDiQ__,
>>> backup=/_BUDDY_BACKUP_/10.34.32.155_41754:DEAD/1/JSESSION/ 
>>> st_localhost/wUaU4FhUzWcuiDEjkJzDiQ__}
>>> [sender=10.34.32.153:24595]
>>> [JBoss] 20:29:27,489 TRACE
>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>> (Incoming-5,10.34.32.154:46064) This is a non-visitable command - so
>>> performing directly and not via the invoker.
>>> [JBoss] 20:29:42,497 ERROR
>>> [org.jboss.cache.lock.ReadWriteLockWithUpgrade]
>>> (Incoming-5,10.34.32.154:46064) upgradeLock(): failed
>>> [JBoss] 20:29:42,500 TRACE
>>> [org.jboss.cache.marshall.CommandAwareRpcDispatcher]
>>> (Incoming-5,10.34.32.154:46064) Problems invoking command.
>>> [JBoss] org.jboss.cache.lock.UpgradeException: upgrade lock for
>>> /_BUDDY_BACKUP_ could not be acquired after 15000 ms. Lock map  
>>> ownership
>>> Read lock owners: [GlobalTransaction:<10.34.32.153:24595>:70,
>>> GlobalTransaction:<10.34.32.153:24595>:69]
>>> [JBoss] Write lock owner: null
>>> [JBoss]  (caller=Thread[Incoming-5,10.34.32.154:46064,5,Thread  
>>> Pools],
>>> lock info: read owners=[GlobalTransaction:<10.34.32.153:24595>: 
>>> 70,
>>> GlobalTransaction:<10.34.32.153:24595>:69] (activeReaders=2,
>>> activeWriter=null, waitingReaders=1, waitingWriters=0,  
>>> waitingUpgrader=0))
>>> [JBoss]     at
>>> org 
>>> .jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java: 
>>> 227)
>>> [JBoss]     at
>>> org 
>>> .jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java: 
>>> 176)
>>> [JBoss]     at  
>>> org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:503)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .lock 
>>> .PessimisticNodeBasedLockManager 
>>> .acquireNodeLock(PessimisticNodeBasedLockManager.java:260)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .lock 
>>> .PessimisticNodeBasedLockManager 
>>> .lock(PessimisticNodeBasedLockManager.java:169)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .lock 
>>> .PessimisticNodeBasedLockManager 
>>> .lockPessimistically(PessimisticNodeBasedLockManager.java:97)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .PessimisticLockInterceptor 
>>> .handleRemoveNodeCommand(PessimisticLockInterceptor.java:265)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .PrePostProcessingCommandInterceptor 
>>> .visitRemoveNodeCommand(PrePostProcessingCommandInterceptor.java: 
>>> 126)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java: 
>>> 116)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .ReplicationInterceptor 
>>> .handleCrudMethod(ReplicationInterceptor.java:148)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .ReplicationInterceptor 
>>> .visitRemoveNodeCommand(ReplicationInterceptor.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java: 
>>> 116)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java: 
>>> 116)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .TxInterceptor.attachGtxAndPassUpChain(TxInterceptor.java:301)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache.interceptors.TxInterceptor.handleDefault(TxInterceptor.java: 
>>> 283)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java: 
>>> 116)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .AbstractVisitor.visitRemoveNodeCommand(AbstractVisitor.java:75)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .base 
>>> .CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java: 
>>> 116)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .InvocationContextInterceptor 
>>> .handleAll(InvocationContextInterceptor.java:178)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .interceptors 
>>> .InvocationContextInterceptor 
>>> .visitRemoveNodeCommand(InvocationContextInterceptor.java:88)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .write.RemoveNodeCommand.acceptVisitor(RemoveNodeCommand.java:125)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache.interceptors.InterceptorChain.invoke(InterceptorChain.java: 
>>> 287)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .remote 
>>> .DataGravitationCleanupCommand 
>>> .executeRemove(DataGravitationCleanupCommand.java:151)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .commands 
>>> .remote 
>>> .DataGravitationCleanupCommand 
>>> .perform(DataGravitationCleanupCommand.java:105)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .marshall 
>>> .CommandAwareRpcDispatcher 
>>> .executeCommand(CommandAwareRpcDispatcher.java:305)
>>> [JBoss]     at
>>> org 
>>> .jboss 
>>> .cache 
>>> .marshall 
>>> .CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java: 
>>> 242)
>>> [JBoss]     at
>>> org 
>>> .jgroups 
>>> .blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637)
>>> [JBoss]     at
>>> org 
>>> .jgroups 
>>> .blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545)
>>> [JBoss]     at
>>> org 
>>> .jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java: 
>>> 368)
>>> [JBoss]     at
>>> org.jgroups.blocks.MessageDispatcher 
>>> $ProtocolAdapter.up(MessageDispatcher.java:775)
>>> [JBoss]     at org.jgroups.JChannel.up(JChannel.java:1339)
>>> [JBoss]     at  
>>> org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
>>> [JBoss]     at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:432)
>>> [JBoss]     at
>>> org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java: 
>>> 144)
>>> [JBoss]     at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
>>> [JBoss]     at org.jgroups.protocols.FC.up(FC.java:473)
>>> [JBoss]     at org.jgroups.protocols.pbcast.GMS.up(GMS.java:824)
>>> [JBoss]     at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java: 
>>> 233)
>>> [JBoss]     at
>>> org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:616)
>>> [JBoss]     at org.jgroups.protocols.UNICAST.up(UNICAST.java:282)
>>> [JBoss]     at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java: 
>>> 758)
>>> [JBoss]     at  
>>> org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
>>> [JBoss]     at org.jgroups.protocols.FD.up(FD.java:284)
>>> [JBoss]     at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
>>> [JBoss]     at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
>>> [JBoss]     at org.jgroups.protocols.Discovery.up(Discovery.java: 
>>> 264)
>>> [JBoss]     at org.jgroups.protocols.PING.up(PING.java:273)
>>> [JBoss]     at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java: 
>>> 2327)
>>> [JBoss]     at org.jgroups.protocols.TP.passMessageUp(TP.java:1261)
>>> [JBoss]     at org.jgroups.protocols.TP.access$100(TP.java:49)
>>> [JBoss]     at
>>> org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java: 
>>> 1838)
>>> [JBoss]     at org.jgroups.protocols.TP$IncomingPacket.run(TP.java: 
>>> 1817)
>>> [JBoss]     at
>>> java.util.concurrent.ThreadPoolExecutor 
>>> $Worker.runTask(ThreadPoolExecutor.java:886)
>>> [JBoss]     at
>>> java.util.concurrent.ThreadPoolExecutor 
>>> $Worker.run(ThreadPoolExecutor.java:908)
>>> [JBoss]     at java.lang.Thread.run(Thread.java:619)
>>>
>>> Manik Surtani wrote:
>>>>
>>>> On 4 Aug 2009, at 21:28, Brian Stansberry wrote:
>>>>
>>>>> Continuing dialogue with myself. :)
>>>>>
>>>>> I've found the logic in PessimisticNodeBasedLockManager where the
>>>>> LockParentForChildInsertRemove configuration is meant to be coming
>>>>> through via the Node.
>>>>
>>>> Yes, this is because you can override this on a per-node basis.
>>>>
>>>>> In our testing we're seeing cases where write
>>>>> locks are being acquired on parent nodes, which isn't the  
>>>>> intent. I'll
>>>>> dig further.
>>>>>
>>>>> Brian Stansberry wrote:
>>>>>> https://jira.jboss.org/jira/browse/JBCACHE-1527
>>>>>>
>>>>>> Brian Stansberry wrote:
>>>>>>> From looking at the JBC 3 code, it seems the
>>>>>>> LockParentForChildInsertRemove configuration is no longer  
>>>>>>> respected for
>>>>>>> pessimistic locking.  I can't trace any path from the property  
>>>>>>> in
>>>>>>> Configuration to code that uses it.
>>>>>>> PessimisticLockInterceptor.handlePutCommand, handleMoveCommand  
>>>>>>> and
>>>>>>> handleRemoveNodeCommand all always tell the lock manager to lock
>>>>>>> parents. handleEvictFqnCommand always tells the lock manager  
>>>>>>> not to
>>>>>>> lock
>>>>>>> parents.
>>>>>>>
>>>>>>> This is causing failures in buddy replication testing when nodes
>>>>>>> join/leave clusters under load. There's a lot of data  
>>>>>>> gravitation plus
>>>>>>> stuff like migrating defunct backup trees to "DEAD" regions.  
>>>>>>> Too much
>>>>>>> contention for parent level locks.
>>>>>>>
>>>>>>> Plus locking on the structural parent to add/remove session  
>>>>>>> nodes will
>>>>>>> suck for the session caching use case.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Brian Stansberry
>>>>> Lead, AS Clustering
>>>>> JBoss by Red Hat
>>>>> _______________________________________________
>>>>> jbosscache-dev mailing list
>>>>> jbosscache-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>>>>
>>>> -- 
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> Lead, Infinispan
>>>> Lead, JBoss Cache
>>>> http://www.infinispan.org
>>>> http://www.jbosscache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Brian Stansberry
>>> Lead, AS Clustering
>>> JBoss by Red Hat
>>> _______________________________________________
>>> jbosscache-dev mailing list
>>> jbosscache-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>> -- 
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>
>
> -- 
> Brian Stansberry
> Lead, AS Clustering
> JBoss by Red Hat

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org






More information about the jbosscache-dev mailing list