[JBoss JIRA] (ISPN-2588) Lock leak during state transfer (causing StaleLocksTransactionTest to fail)
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2588?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2588:
--------------------------------
Fix Version/s: 5.2.0.CR2
> Lock leak during state transfer (causing StaleLocksTransactionTest to fail)
> ---------------------------------------------------------------------------
>
> Key: ISPN-2588
> URL: https://issues.jboss.org/browse/ISPN-2588
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta5
> Reporter: Mircea Markus
> Assignee: Adrian Nistor
> Priority: Blocker
> Fix For: 5.2.0.CR2, 5.2.0.Final
>
> Attachments: StaleLocksTransactionTest.zip
>
>
> numOwners=1, pessimistic cache (same applies if A is the only node in cluster)
> 1. tx1 running on A with writes on k, lockOwner(k) == {A}
> 2. A.tx1.lock(k), this doesn't go remotely, and control returns in the InterceptorStack
> 3. at this point B is started and lockOwner(k) == {B}
> 4. the StateTransferInterceptor forwards the command to B which acquires the lock locally
> 5. this is followed by a tx.commit/rollback that would not send the message to B, so the lock on B is pending.
> The logic which determines whether the message to be sent remotely or not is in DistributionInterceptor.visitCommitCommand, which invokes:
> {code:java}
> protected boolean shouldInvokeRemoteTxCommand(TxInvocationContext ctx) {
> return ctx.isOriginLocal() && (ctx.hasModifications() ||
> !((LocalTxInvocationContext) ctx).getRemoteLocksAcquired().isEmpty());
> }
> {code}
> The problem here is that, when forwarding, we don't register the remote node as a locked.I think a more generic solution would also work, e.g. if the viewId of the tx is different from the viewId of the cluster at commit time, always go remotely.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2483) State transfer issue with the transactions for which the originator has crashed
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2483?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2483:
--------------------------------
Fix Version/s: 5.2.0.CR2
(was: 5.2.0.Final)
> State transfer issue with the transactions for which the originator has crashed
> -------------------------------------------------------------------------------
>
> Key: ISPN-2483
> URL: https://issues.jboss.org/browse/ISPN-2483
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer, Transactions
> Affects Versions: 5.1.8.Final, 5.2.0.Beta3
> Reporter: Mircea Markus
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
>
> State transfer migrates and prepares the transactions for which the originator has left. On the receiving node, this results in the transaction being prepared and acquiring backup locks which are never released (unless manual intervention).
> This should behave as follows:
> - if there's no recovery enabled, the state producer should not send such transactions but drop them
> - if recovery is enabled these transactions should be sent across. They shouldn't be prepared/acquire backup locks, but be placed in the recovery cache (see RecoveryManagerImpl.inDoubtTransactions)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2697:
-------------------------------------
[~galderz] yes, the solution seems to be to add an RSPV to the command that writes the entry to the cache. Or write the entry twice (not extremely nice, but practical).
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2697:
--------------------------------
Assignee: Dan Berindei
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2439) Deadlock in Map/Reduce tasks
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2439?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2439:
--------------------------------
Fix Version/s: 5.2.0.CR2
(was: 5.2.0.Final)
> Deadlock in Map/Reduce tasks
> ----------------------------
>
> Key: ISPN-2439
> URL: https://issues.jboss.org/browse/ISPN-2439
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.0.Beta2
> Reporter: Dan Berindei
> Assignee: Mircea Markus
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
> Attachments: dfnmrt.log.gz
>
>
> It looks like the Map/Reduce intermediate caches use pessimistic transactions, but the transactions are not guaranteed to write to the keys in the same order. So it's possible for two tasks to get into a deadlock, ending with a TimeoutException:
> {noformat}
> 16:18:40,649 ERROR (testng-DistributedFourNodesMapReduceTest:) [UnitTestTestNGListener] Test testCombinerDoesNotChangeResult(org.infinispan.distexec.mapreduce.DistributedFourNodesMapReduceTest) failed.
> org.infinispan.CacheException: Could not invoke map phase of MapReduce task on remote nodes
> at org.infinispan.distexec.mapreduce.MapReduceTask.invokeEverywhere(MapReduceTask.java:562)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhase(MapReduceTask.java:374)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:315)
> at org.infinispan.distexec.mapreduce.BaseWordCountMapReduceTest.testCombinerDoesNotChangeResult(BaseWordCountMapReduceTest.java:188)
> ...
> Caused by: org.infinispan.CacheException: org.infinispan.CacheException: Could not move intermediate keys/values for M/R task 04244b4b-08b1-4fc4-9755-ed02f3f35a3a
> at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.mapAndCombineForDistributedReduction(MapReduceManagerImpl.java:97)
> at org.infinispan.commands.read.MapCombineCommand.perform(MapCombineCommand.java:89)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:110)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:82)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:244)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:217)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> ...
> Caused by: org.infinispan.CacheException: Could not move intermediate keys/values for M/R task 04244b4b-08b1-4fc4-9755-ed02f3f35a3a
> at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.combine(MapReduceManagerImpl.java:281)
> at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.mapAndCombineForDistributedReduction(MapReduceManagerImpl.java:95)
> ... 26 more
> Caused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [JBoss] for requestor [GlobalTransaction:<NodeD-56763>:10429:remote]! Lock held by [GlobalTransaction:<NodeB-55590>:10432:remote]
> at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:217)
> at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:190)
> at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:190)
> at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:125)
> at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:248)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
> at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
> at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:125)
> at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:174)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:212)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:187)
> at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:131)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:129)
> at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:98)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:347)
> at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:150)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:110)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:82)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:244)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:217)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> ...
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months