January 2013 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-2588) Lock leak during state transfer (causing StaleLocksTransactionTest to fail)

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2588?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2588: -------------------------------- Fix Version/s: 5.2.0.CR2 > Lock leak during state transfer (causing StaleLocksTransactionTest to fail) > --------------------------------------------------------------------------- > > Key: ISPN-2588 > URL: https://issues.jboss.org/browse/ISPN-2588 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 5.2.0.Beta5 > Reporter: Mircea Markus > Assignee: Adrian Nistor > Priority: Blocker > Fix For: 5.2.0.CR2, 5.2.0.Final > > Attachments: StaleLocksTransactionTest.zip > > > numOwners=1, pessimistic cache (same applies if A is the only node in cluster) > 1. tx1 running on A with writes on k, lockOwner(k) == {A} > 2. A.tx1.lock(k), this doesn't go remotely, and control returns in the InterceptorStack > 3. at this point B is started and lockOwner(k) == {B} > 4. the StateTransferInterceptor forwards the command to B which acquires the lock locally > 5. this is followed by a tx.commit/rollback that would not send the message to B, so the lock on B is pending. > The logic which determines whether the message to be sent remotely or not is in DistributionInterceptor.visitCommitCommand, which invokes: > {code:java} > protected boolean shouldInvokeRemoteTxCommand(TxInvocationContext ctx) { > return ctx.isOriginLocal() && (ctx.hasModifications() || > !((LocalTxInvocationContext) ctx).getRemoteLocksAcquired().isEmpty()); > } > {code} > The problem here is that, when forwarding, we don't register the remote node as a locked.I think a more generic solution would also work, e.g. if the viewId of the tx is different from the viewId of the cluster at commit time, always go remotely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2483) State transfer issue with the transactions for which the originator has crashed

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2483?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2483: -------------------------------- Fix Version/s: 5.2.0.CR2 (was: 5.2.0.Final) > State transfer issue with the transactions for which the originator has crashed > ------------------------------------------------------------------------------- > > Key: ISPN-2483 > URL: https://issues.jboss.org/browse/ISPN-2483 > Project: Infinispan > Issue Type: Bug > Components: State transfer, Transactions > Affects Versions: 5.1.8.Final, 5.2.0.Beta3 > Reporter: Mircea Markus > Assignee: Dan Berindei > Priority: Blocker > Fix For: 5.2.0.CR2 > > > State transfer migrates and prepares the transactions for which the originator has left. On the receiving node, this results in the transaction being prepared and acquiring backup locks which are never released (unless manual intervention). > This should behave as follows: > - if there's no recovery enabled, the state producer should not send such transactions but drop them > - if recovery is enabled these transactions should be sent across. They shouldn't be prepared/acquire backup locks, but be placed in the recovery cache (see RecoveryManagerImpl.inDoubtTransactions) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2632) Uneven request balancing after node crash

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2632?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2632: -------------------------------- Priority: Critical (was: Blocker) > Uneven request balancing after node crash > ----------------------------------------- > > Key: ISPN-2632 > URL: https://issues.jboss.org/browse/ISPN-2632 > Project: Infinispan > Issue Type: Bug > Components: Remote protocols > Affects Versions: 5.2.0.CR1 > Reporter: Michal Linhard > Assignee: Galder Zamarreño > Priority: Critical > Fix For: 5.2.0.CR2, 5.2.0.Final > > > This is a new manifestation of ISPN-1995, but in this case this happens after killing only one node: the hot rod requests aren't very well balanced. > these runs still manifest also ISPN-2550 and it may be cause of this bug. > The uneven balancing of requests can be seen here: > https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPOR... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2507) @DataRehashed event is not triggered

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2507?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2507: -------------------------------- Priority: Critical (was: Major) > @DataRehashed event is not triggered > ------------------------------------ > > Key: ISPN-2507 > URL: https://issues.jboss.org/browse/ISPN-2507 > Project: Infinispan > Issue Type: Bug > Reporter: Anna Manukyan > Assignee: Dan Berindei > Priority: Critical > Fix For: 5.2.0.CR2 > > > The case is the following: > There is 2 node cluster. The first is up and running. The second node starts and puts data into the cache. The cache is configured with DIST_ASYNC clustering mode. > The test checks that the data is rehashed, i.e. state transfer has done, but the @DataRehashed event was not triggered. > You can find the failing test results here: > http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/edg-60-jdbc-cache-sto... > and here: > http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/edg-60-jdbc-cache-sto... > The test is located here: > https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trun... > Best regards, > Anna. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ] Mircea Markus commented on ISPN-2697: ------------------------------------- [~galderz] yes, the solution seems to be to add an RSPV to the command that writes the entry to the cache. Or write the entry twice (not extremely nice, but practical). > HotRodServer startup fails when its record cannot be inserted into topology cache > --------------------------------------------------------------------------------- > > Key: ISPN-2697 > URL: https://issues.jboss.org/browse/ISPN-2697 > Project: Infinispan > Issue Type: Bug > Components: Remote protocols > Affects Versions: 5.2.0.Beta6 > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > Fix For: 5.2.0.CR2 > > > When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}). > However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2697: -------------------------------- Assignee: Dan Berindei > HotRodServer startup fails when its record cannot be inserted into topology cache > --------------------------------------------------------------------------------- > > Key: ISPN-2697 > URL: https://issues.jboss.org/browse/ISPN-2697 > Project: Infinispan > Issue Type: Bug > Components: Remote protocols > Affects Versions: 5.2.0.Beta6 > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > Fix For: 5.2.0.CR2 > > > When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}). > However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2435) Cache#replace(key, old, new) method doesn't check old value on non-local invocations

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2435?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2435: -------------------------------- Fix Version/s: 5.2.0.CR2 > Cache#replace(key, old, new) method doesn't check old value on non-local invocations > ------------------------------------------------------------------------------------ > > Key: ISPN-2435 > URL: https://issues.jboss.org/browse/ISPN-2435 > Project: Infinispan > Issue Type: Bug > Components: Core API, Distributed Cache > Affects Versions: 5.1.8.Final, 5.2.0.Beta2 > Reporter: Sanne Grinovero > Assignee: Mircea Markus > Priority: Critical > Fix For: 5.2.0.CR2, 5.2.0.Final > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2326) c3p0 not part of the release

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2326?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2326: -------------------------------- Fix Version/s: 5.2.0.CR2 (was: 5.2.0.Final) > c3p0 not part of the release > ---------------------------- > > Key: ISPN-2326 > URL: https://issues.jboss.org/browse/ISPN-2326 > Project: Infinispan > Issue Type: Bug > Components: Build process > Affects Versions: 5.2.0.Alpha4 > Reporter: Thomas Fromm > Assignee: Adrian Nistor > Priority: Blocker > Fix For: 5.2.0.CR2 > > > According infinispan-5.2.0.Alpha4-all/modules/cachestores/jdbc/runtime-classpath.txt c3p0 lib should be part of the release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2439) Deadlock in Map/Reduce tasks

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2439?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2439: -------------------------------- Fix Version/s: 5.2.0.CR2 (was: 5.2.0.Final) > Deadlock in Map/Reduce tasks > ---------------------------- > > Key: ISPN-2439 > URL: https://issues.jboss.org/browse/ISPN-2439 > Project: Infinispan > Issue Type: Bug > Components: Distributed Execution and Map/Reduce > Affects Versions: 5.2.0.Beta2 > Reporter: Dan Berindei > Assignee: Mircea Markus > Priority: Blocker > Fix For: 5.2.0.CR2 > > Attachments: dfnmrt.log.gz > > > It looks like the Map/Reduce intermediate caches use pessimistic transactions, but the transactions are not guaranteed to write to the keys in the same order. So it's possible for two tasks to get into a deadlock, ending with a TimeoutException: > {noformat} > 16:18:40,649 ERROR (testng-DistributedFourNodesMapReduceTest:) [UnitTestTestNGListener] Test testCombinerDoesNotChangeResult(org.infinispan.distexec.mapreduce.DistributedFourNodesMapReduceTest) failed. > org.infinispan.CacheException: Could not invoke map phase of MapReduce task on remote nodes > at org.infinispan.distexec.mapreduce.MapReduceTask.invokeEverywhere(MapReduceTask.java:562) > at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhase(MapReduceTask.java:374) > at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:315) > at org.infinispan.distexec.mapreduce.BaseWordCountMapReduceTest.testCombinerDoesNotChangeResult(BaseWordCountMapReduceTest.java:188) > ... > Caused by: org.infinispan.CacheException: org.infinispan.CacheException: Could not move intermediate keys/values for M/R task 04244b4b-08b1-4fc4-9755-ed02f3f35a3a > at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.mapAndCombineForDistributedReduction(MapReduceManagerImpl.java:97) > at org.infinispan.commands.read.MapCombineCommand.perform(MapCombineCommand.java:89) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:110) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:82) > at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:244) > at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:217) > at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483) > ... > Caused by: org.infinispan.CacheException: Could not move intermediate keys/values for M/R task 04244b4b-08b1-4fc4-9755-ed02f3f35a3a > at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.combine(MapReduceManagerImpl.java:281) > at org.infinispan.distexec.mapreduce.MapReduceManagerImpl.mapAndCombineForDistributedReduction(MapReduceManagerImpl.java:95) > ... 26 more > Caused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [JBoss] for requestor [GlobalTransaction:<NodeD-56763>:10429:remote]! Lock held by [GlobalTransaction:<NodeB-55590>:10432:remote] > at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:217) > at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:190) > at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockKeyAndCheckOwnership(AbstractTxLockingInterceptor.java:190) > at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAndRegisterBackupLock(AbstractTxLockingInterceptor.java:125) > at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:248) > at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131) > at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) > at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132) > at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177) > at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131) > at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) > at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:125) > at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:174) > at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131) > at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) > at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:212) > at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:187) > at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:131) > at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131) > at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) > at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:129) > at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:98) > at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131) > at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:347) > at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:150) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:110) > at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:82) > at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:244) > at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:217) > at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483) > ... > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2641) Use RpcManager#getMembers instead of Transport#getMembers if necessary

by Mircea Markus (JIRA)

[ https://issues.jboss.org/browse/ISPN-2641?page=com.atlassian.jira.plugin.... ] Mircea Markus updated ISPN-2641: -------------------------------- Fix Version/s: 5.2.0.CR2 (was: 5.2.0.Final) > Use RpcManager#getMembers instead of Transport#getMembers if necessary > ---------------------------------------------------------------------- > > Key: ISPN-2641 > URL: https://issues.jboss.org/browse/ISPN-2641 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache > Affects Versions: 5.2.0.Beta5 > Reporter: Vladimir Blagojevic > Assignee: Dan Berindei > Fix For: 5.2.0.CR2 > > > Recently we have added RpcManager#getMembers method which returns cache members rather that entire cluster members returned by Transport#getMembers. We should review the entire codebase very carefully and replace instances of Transport#getMembers with RpcManager#getMembers if necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

13 years, 3 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues January 2013