[jboss-jira] [JBoss JIRA] (AS7-4813) Shutdown in REPL SYNC results in TimeoutException: Unable to acquire lock after

Richard Achmatowicz (JIRA) jira-events at lists.jboss.org
Tue May 15 14:47:17 EDT 2012


    [ https://issues.jboss.org/browse/AS7-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693058#comment-12693058 ] 

Richard Achmatowicz edited comment on AS7-4813 at 5/15/12 2:45 PM:
-------------------------------------------------------------------

A request is initiated by Runner 229 on perf17 which generates a session sgalNLPm8jvqZzLyQjwt4oZ3 which gets assigned jvmRoute perf19. It uses perf19 to process requests until perf19 fails and the jvmRoute is changed to perf18. The server logs show activity at perf18 and perf21: the perf18 exceptions come in groups of three

{noformat}
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,112 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (ajp-perf18/10.16.90.54:8009-97) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,123 ERROR [org.infinispan.transaction.TransactionCoordinator] (ajp-perf18/10.16.90.54:8009-97) Error while processing prepare: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,132 ERROR [org.infinispan.transaction.tm.DummyTransaction] (ajp-perf18/10.16.90.54:8009-97) ISPN000109: beforeCompletion() failed for SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, transaction=DummyTransaction{xid=DummyXid{id=3842}, status=1}, lockedKeys=null, backupKeyLocks=[sgalNLPm8jvqZzLyQjwt4oZ3], viewId=7} org.infinispan.transaction.synchronization.SyncLocalTransaction at f00} org.infinispan.transaction.synchronization.SynchronizationAdapter at f1f: org.infinispan.CacheException: Could not prepare. 
{noformat}

representing an incoming prepare request to the assigned slave; the perf21 exceptions 
{noformat}
./lastSuccessful/perf21.log:[JBossINF] 11:46:54,106 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1720,null) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
{noformat}
are the corresponding remote prepare commands which are sent to all hosts participating in the prepare and which fail only on perf21. Those exceptions are then returned to perf18 and show up there.

These coupled pairs of exceptions occur every 19 (= 4 + 15) seconds where the client sends out requests every 4 seconds regularly, and the timeout on waiting for the locks is 15 seconds. Each coupled pair of exceptions represents a different incoming retry request by the client to do complete the HTTP request. These exceptions only arise after perf19 fails and they disappear when perf19 comes back.
   
Is it possible that perf21 was a remote participant in a transaction initiated on perf19 (before failure) and perf19 (the coordinator for the prepare) was shutdown during the prepare phase for a transaction involving the session in question. So perf21 ended up with locks on the session which let's assume were never released, in a different cache view. Then perf18 becomes the new jvmRoute for the session, and, in a new cache view, tries to prepare a transaction using the same session id. When it coordinates its prepare phase, the remote command goes out to perf21 which tries to take locks for that session locally and it times out waiting for the locks.

The way locks are taken on remote nodes as in AbstractLockingTxInterceptor.lockKeyAndCheckOwnership() is to:
- check if there any pending locks for that session id (i.e. locks on transactions from previous cache views which are pending - have not completed)
- if there are pending locks, wait for them to complete
The exceptions in this issue are occurring when we try to wait for locks for pending transactions (in earlier cache views) and we never get them time (in 15 seconds). So it may be that there are stale entries in the transaction tables.
   
                
      was (Author: rachmato):
    A request is initiated by Runner 229 on perf17 which generates a session sgalNLPm8jvqZzLyQjwt4oZ3 which gets assigned jvmRoute perf19. It uses perf19 to process requests until perf19 fails and the jvmRoute is changed to perf18. The server logs show activity at perf18 and perf21: the perf18 exceptions come in groups of three

{noformat}
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,112 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (ajp-perf18/10.16.90.54:8009-97) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,123 ERROR [org.infinispan.transaction.TransactionCoordinator] (ajp-perf18/10.16.90.54:8009-97) Error while processing prepare: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,132 ERROR [org.infinispan.transaction.tm.DummyTransaction] (ajp-perf18/10.16.90.54:8009-97) ISPN000109: beforeCompletion() failed for SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, transaction=DummyTransaction{xid=DummyXid{id=3842}, status=1}, lockedKeys=null, backupKeyLocks=[sgalNLPm8jvqZzLyQjwt4oZ3], viewId=7} org.infinispan.transaction.synchronization.SyncLocalTransaction at f00} org.infinispan.transaction.synchronization.SynchronizationAdapter at f1f: org.infinispan.CacheException: Could not prepare. 
{noformat}

representing an incoming prepare request to the assigned slave; the perf21 exceptions 
{noformat}
./lastSuccessful/perf21.log:[JBossINF] 11:46:54,106 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1720,null) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
{noformat}
are the corresponding remote prepare commands which are sent to all hosts participating in the prepare and which fail only on perf21. Those exceptions are then returned to perf18 and show up there.

These coupled pairs of exceptions occur every 19 (= 4 + 15) seconds where the client sends out requests every 4 seconds regularly, and the timeout on waiting for the locks is 15 seconds. Each coupled pair of exceptions represents a different incoming retry request by the client to do complete the HTTP request. These exceptions only arise after perf19 fails and they disappear when perf19 comes back.
   
Is it possible that perf21 was a remote participant in a transaction initiated on perf19 (before failure) and perf19 (the coordinator for the prepare) was shutdown during the prepare phase for a transaction involving the session in question. So perf21 ended up with locks on the session which let's assume were never released, in a different cache view. Then perf18 becomes the new jvmRoute for the session, and tries to prepare a transaction using the same session id. When it coordinates its prepare phase, the remote command goes out to perf21 which tries to take locks for that session locally and it times out waiting for the locks.

The way locks are taken on remote nodes as in AbstractLockingTxInterceptor.lockKeyAndCheckOwnership() is to:
- check if there any pending locks for that session id (i.e. locks on transactions from previous cache views which are pending - have not completed)
- if there are pending locks, wait for them to complete
The exceptions in this issue are occurring when we try to wait for locks for pending transactions (in earlier cache views) and we never get them time (in 15 seconds). So it may be that there are stale entries in the transaction tables.
   
                  
> Shutdown in REPL SYNC results in TimeoutException: Unable to acquire lock after
> -------------------------------------------------------------------------------
>
>                 Key: AS7-4813
>                 URL: https://issues.jboss.org/browse/AS7-4813
>             Project: Application Server 7
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 7.1.2.Final (EAP)
>            Reporter: Radoslav Husar
>            Assignee: Paul Ferraro
>             Fix For: 7.1.3.Final (EAP)
>
>
> Similar to what we had before, but this time in failover tests. 
> Whole logs
> http://hudson.qa.jboss.com/hudson/view/EAP6/view/EAP6-Failover/job/eap-6x-failover-http-session-shutdown-repl-sync/35/
> {noformat}
> 11:46:54,106 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1720,null) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor [GlobalTransaction:<perf18/web>:3840:remote]! Lock held by [GlobalTransaction:<perf18/web>:3617:remote]
> 	at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:216)
> 	at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:189)
> 	at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:179)
> 	at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:209)
> 	at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:136)
> 	at org.infinispan.interceptors.locking.OptimisticLockingInterceptor$LockAcquisitionVisitor.visitPutKeyValueCommand(OptimisticLockingInterceptor.java:243)
> 	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)
> 	at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.acquireLocksVisitingCommands(OptimisticLockingInterceptor.java:318)
> 	at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:119)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.NotificationInterceptor.visitPrepareCommand(NotificationInterceptor.java:58)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:106)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)
> 	at org.infinispan.interceptors.StateTransferLockInterceptor.visitPrepareCommand(StateTransferLockInterceptor.java:85)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)
> 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:131)
> 	at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:90)
> 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
> 	at org.infinispan.interceptors.BatchingInterceptor.handleDefault(BatchingInterceptor.java:70)
> 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
> 	at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)
> 	at org.infinispan.commands.tx.PrepareCommand.perform(PrepareCommand.java:128)
> 	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:127)
> 	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:136)
> 	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithRetry(InboundInvocationHandlerImpl.java:162)
> 	at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:114)
> 	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)
> 	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)
> 	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.JChannel.up(JChannel.java:716) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.RSVP.up(RSVP.java:192) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:383) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:706) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:566) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.BARRIER.up(BARRIER.java:126) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.FD.up(FD.java:273) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.Discovery.up(Discovery.java:359) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.stack.Protocol.up(Protocol.java:363) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1180) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710) [jgroups-3.0.9.Final.jar:3.0.9.Final]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_30]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_30]
> 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_30]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list