[
https://issues.jboss.org/browse/AS7-4813?page=com.atlassian.jira.plugin.s...
]
Richard Achmatowicz edited comment on AS7-4813 at 5/15/12 2:45 PM:
-------------------------------------------------------------------
A request is initiated by Runner 229 on perf17 which generates a session
sgalNLPm8jvqZzLyQjwt4oZ3 which gets assigned jvmRoute perf19. It uses perf19 to process
requests until perf19 fails and the jvmRoute is changed to perf18. The server logs show
activity at perf18 and perf21: the perf18 exceptions come in groups of three
{noformat}
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,112 ERROR
[org.infinispan.interceptors.InvocationContextInterceptor]
(ajp-perf18/10.16.90.54:8009-97) ISPN000136: Execution error:
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds]
on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,123 ERROR
[org.infinispan.transaction.TransactionCoordinator] (ajp-perf18/10.16.90.54:8009-97) Error
while processing prepare: org.infinispan.util.concurrent.TimeoutException: Unable to
acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,132 ERROR
[org.infinispan.transaction.tm.DummyTransaction] (ajp-perf18/10.16.90.54:8009-97)
ISPN000109: beforeCompletion() failed for
SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null,
isMarkedForRollback=false, transaction=DummyTransaction{xid=DummyXid{id=3842}, status=1},
lockedKeys=null, backupKeyLocks=[sgalNLPm8jvqZzLyQjwt4oZ3], viewId=7}
org.infinispan.transaction.synchronization.SyncLocalTransaction@f00}
org.infinispan.transaction.synchronization.SynchronizationAdapter@f1f:
org.infinispan.CacheException: Could not prepare.
{noformat}
representing an incoming prepare request to the assigned slave; the perf21 exceptions
{noformat}
./lastSuccessful/perf21.log:[JBossINF] 11:46:54,106 ERROR
[org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1720,null) ISPN000136:
Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock
after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
{noformat}
are the corresponding remote prepare commands which are sent to all hosts participating in
the prepare and which fail only on perf21. Those exceptions are then returned to perf18
and show up there.
These coupled pairs of exceptions occur every 19 (= 4 + 15) seconds where the client sends
out requests every 4 seconds regularly, and the timeout on waiting for the locks is 15
seconds. Each coupled pair of exceptions represents a different incoming retry request by
the client to do complete the HTTP request. These exceptions only arise after perf19 fails
and they disappear when perf19 comes back.
Is it possible that perf21 was a remote participant in a transaction initiated on perf19
(before failure) and perf19 (the coordinator for the prepare) was shutdown during the
prepare phase for a transaction involving the session in question. So perf21 ended up with
locks on the session which let's assume were never released, in a different cache
view. Then perf18 becomes the new jvmRoute for the session, and, in a new cache view,
tries to prepare a transaction using the same session id. When it coordinates its prepare
phase, the remote command goes out to perf21 which tries to take locks for that session
locally and it times out waiting for the locks.
The way locks are taken on remote nodes as in
AbstractLockingTxInterceptor.lockKeyAndCheckOwnership() is to:
- check if there any pending locks for that session id (i.e. locks on transactions from
previous cache views which are pending - have not completed)
- if there are pending locks, wait for them to complete
The exceptions in this issue are occurring when we try to wait for locks for pending
transactions (in earlier cache views) and we never get them time (in 15 seconds). So it
may be that there are stale entries in the transaction tables.
was (Author: rachmato):
A request is initiated by Runner 229 on perf17 which generates a session
sgalNLPm8jvqZzLyQjwt4oZ3 which gets assigned jvmRoute perf19. It uses perf19 to process
requests until perf19 fails and the jvmRoute is changed to perf18. The server logs show
activity at perf18 and perf21: the perf18 exceptions come in groups of three
{noformat}
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,112 ERROR
[org.infinispan.interceptors.InvocationContextInterceptor]
(ajp-perf18/10.16.90.54:8009-97) ISPN000136: Execution error:
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds]
on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,123 ERROR
[org.infinispan.transaction.TransactionCoordinator] (ajp-perf18/10.16.90.54:8009-97) Error
while processing prepare: org.infinispan.util.concurrent.TimeoutException: Unable to
acquire lock after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
./lastSuccessful/perf18.log:[JBossINF] 11:46:54,132 ERROR
[org.infinispan.transaction.tm.DummyTransaction] (ajp-perf18/10.16.90.54:8009-97)
ISPN000109: beforeCompletion() failed for
SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null,
isMarkedForRollback=false, transaction=DummyTransaction{xid=DummyXid{id=3842}, status=1},
lockedKeys=null, backupKeyLocks=[sgalNLPm8jvqZzLyQjwt4oZ3], viewId=7}
org.infinispan.transaction.synchronization.SyncLocalTransaction@f00}
org.infinispan.transaction.synchronization.SynchronizationAdapter@f1f:
org.infinispan.CacheException: Could not prepare.
{noformat}
representing an incoming prepare request to the assigned slave; the perf21 exceptions
{noformat}
./lastSuccessful/perf21.log:[JBossINF] 11:46:54,106 ERROR
[org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1720,null) ISPN000136:
Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock
after [15 seconds] on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
{noformat}
are the corresponding remote prepare commands which are sent to all hosts participating in
the prepare and which fail only on perf21. Those exceptions are then returned to perf18
and show up there.
These coupled pairs of exceptions occur every 19 (= 4 + 15) seconds where the client sends
out requests every 4 seconds regularly, and the timeout on waiting for the locks is 15
seconds. Each coupled pair of exceptions represents a different incoming retry request by
the client to do complete the HTTP request. These exceptions only arise after perf19 fails
and they disappear when perf19 comes back.
Is it possible that perf21 was a remote participant in a transaction initiated on perf19
(before failure) and perf19 (the coordinator for the prepare) was shutdown during the
prepare phase for a transaction involving the session in question. So perf21 ended up with
locks on the session which let's assume were never released, in a different cache
view. Then perf18 becomes the new jvmRoute for the session, and tries to prepare a
transaction using the same session id. When it coordinates its prepare phase, the remote
command goes out to perf21 which tries to take locks for that session locally and it times
out waiting for the locks.
The way locks are taken on remote nodes as in
AbstractLockingTxInterceptor.lockKeyAndCheckOwnership() is to:
- check if there any pending locks for that session id (i.e. locks on transactions from
previous cache views which are pending - have not completed)
- if there are pending locks, wait for them to complete
The exceptions in this issue are occurring when we try to wait for locks for pending
transactions (in earlier cache views) and we never get them time (in 15 seconds). So it
may be that there are stale entries in the transaction tables.
Shutdown in REPL SYNC results in TimeoutException: Unable to acquire
lock after
-------------------------------------------------------------------------------
Key: AS7-4813
URL:
https://issues.jboss.org/browse/AS7-4813
Project: Application Server 7
Issue Type: Bug
Components: Clustering
Affects Versions: 7.1.2.Final (EAP)
Reporter: Radoslav Husar
Assignee: Paul Ferraro
Fix For: 7.1.3.Final (EAP)
Similar to what we had before, but this time in failover tests.
Whole logs
http://hudson.qa.jboss.com/hudson/view/EAP6/view/EAP6-Failover/job/eap-6x...
{noformat}
11:46:54,106 ERROR [org.infinispan.interceptors.InvocationContextInterceptor]
(OOB-1720,null) ISPN000136: Execution error:
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [15 seconds]
on key [sgalNLPm8jvqZzLyQjwt4oZ3] for requestor
[GlobalTransaction:<perf18/web>:3840:remote]! Lock held by
[GlobalTransaction:<perf18/web>:3617:remote]
at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:216)
at
org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:189)
at
org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:179)
at
org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:209)
at
org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:136)
at
org.infinispan.interceptors.locking.OptimisticLockingInterceptor$LockAcquisitionVisitor.visitPutKeyValueCommand(OptimisticLockingInterceptor.java:243)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)
at
org.infinispan.interceptors.locking.OptimisticLockingInterceptor.acquireLocksVisitingCommands(OptimisticLockingInterceptor.java:318)
at
org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:119)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.NotificationInterceptor.visitPrepareCommand(NotificationInterceptor.java:58)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:106)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)
at
org.infinispan.interceptors.StateTransferLockInterceptor.visitPrepareCommand(StateTransferLockInterceptor.java:85)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:131)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:90)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at
org.infinispan.interceptors.BatchingInterceptor.handleDefault(BatchingInterceptor.java:70)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:126)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:133)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)
at org.infinispan.commands.tx.PrepareCommand.perform(PrepareCommand.java:128)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:127)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:136)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithRetry(InboundInvocationHandlerImpl.java:162)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:114)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.JChannel.up(JChannel.java:716) [jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.RSVP.up(RSVP.java:192) [jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) [jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:383)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:706)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:566)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.BARRIER.up(BARRIER.java:126)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.FD.up(FD.java:273) [jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.Discovery.up(Discovery.java:359)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.stack.Protocol.up(Protocol.java:363)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.TP.passMessageUp(TP.java:1180)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710)
[jgroups-3.0.9.Final.jar:3.0.9.Final]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[rt.jar:1.6.0_30]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[rt.jar:1.6.0_30]
at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_30]
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira