[JBoss JIRA] (ISPN-3830) Conditional Operation can fail erroneously during ST
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3830?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3830:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1155688|https://bugzilla.redhat.com/show_bug.cgi?id=1155688] from MODIFIED to ON_QA
> Conditional Operation can fail erroneously during ST
> ----------------------------------------------------
>
> Key: ISPN-3830
> URL: https://issues.jboss.org/browse/ISPN-3830
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Reporter: William Burns
> Assignee: Dan Berindei
> Fix For: 7.0.0.CR2
>
>
> Currently the retry logic for commands is to only throw a OutdatedTopologyException when a command is successful.
> This however can have the following issue.
> k1 owned by N1, N2
> # N3 updates k1 sending conditional command to N1
> # N1 receives command and starts running it (passes topology block)
> # ST occurs causing N1 to no longer be an owner and removes k1 from it's data container.
> # N1 runs optional command and it fails and thus doesn't throw OutdatedTopologyException
> # N3 gets response that command failed, but it could have worked had it gone to a real owner.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4776) The topology id for the merged cache topology is not always bigger than all the partition topology ids
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4776?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4776:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1155611|https://bugzilla.redhat.com/show_bug.cgi?id=1155611] from MODIFIED to ON_QA
> The topology id for the merged cache topology is not always bigger than all the partition topology ids
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4776
> URL: https://issues.jboss.org/browse/ISPN-4776
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.0.Beta2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR2, 7.0.0.Final
>
>
> With the ISPN-4574 fix, I changed the merge algorithm to pick the partition with the most members (both in the _stable_ topology and in the _current_ topology) instead of the partition with the highest topology id.
> However, the biggest topology is not necessarily the partition with the highest topology id, so it's possible that some nodes will ignore the merged topology because they already have a higher topology installed. This happened once in ClusterTopologyManagerTest.testClusterRecoveryAfterThreeWaySplit:
> {noformat}
> 00:24:59,286 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Recovered 3 partition(s) for cache cache: [CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeL-25322: 60+0]}, pendingCH=null, unionCH=null}, CacheTopology{id=6, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+10, NodeN-6727: 30+10]}, pendingCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+30, NodeN-6727: 30+30]}, unionCH=null}, CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}]
> 00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Updating topologies after merge for cache cache, current topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, stable topology = CacheTopology{id=4, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (3)[, NodeL-25322: 20+20, NodeM-12972: 20+20, NodeN-6727: 20+20]}, pendingCH=null, unionCH=null}, availability mode = null
> 00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cache, topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, availability mode = null
> 00:24:59,288 TRACE (transport-thread-NodeL-p33097-t3:) [LocalTopologyManagerImpl] Ignoring consistent hash update for cache cache, current topology is 8: CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}
> {noformat}
> Failure logs here: http://ci.infinispan.org/viewLog.html?buildId=12364&buildTypeId=Infinispa...
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4027) TransactionTable.start() initialize the TxCleanupService thread pool even when the cache is NON_TRANSACTIONAL
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4027?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4027:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1152942|https://bugzilla.redhat.com/show_bug.cgi?id=1152942] from MODIFIED to ON_QA
> TransactionTable.start() initialize the TxCleanupService thread pool even when the cache is NON_TRANSACTIONAL
> -------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4027
> URL: https://issues.jboss.org/browse/ISPN-4027
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 6.0.1.Final
> Reporter: Guillermo GARCIA OCHOA
> Assignee: Takayoshi Kimura
> Labels: 630
> Fix For: 7.0.0.CR2
>
>
> In the {{TransactionTable.start()}} each cache creates a thread pool and a job is scheduled to clean up completed transactions.
> {code:java}
> private void start() {
> ...
> totalOrder = configuration.transaction().transactionProtocol().isTotalOrder();
> if (!totalOrder) {
> // Periodically run a task to cleanup the transaction table from completed transactions.
> ThreadFactory tf = new ThreadFactory() {
> @Override
> public Thread newThread(Runnable r) {
> String address = rpcManager != null ? rpcManager.getTransport().getAddress().toString() : "local";
> Thread th = new Thread(r, "TxCleanupService," + cacheName + "," + address);
> th.setDaemon(true);
> return th;
> }
> };
> executorService = Executors.newSingleThreadScheduledExecutor(tf);
> long interval = configuration.transaction().reaperWakeUpInterval();
> executorService.scheduleAtFixedRate(new Runnable() {
> @Override
> public void run() {
> cleanupCompletedTransactions();
> }
> }, interval, interval, TimeUnit.MILLISECONDS);
> }
> }
> {code}
> As you can see in the code, even is the cache is {{NON_TRANSACTIONAL}} the job is scheduled, consuming resources to do nothing (the {{completedTransactions}} map is always empty)
> Maybe I'm missing something, but our application profiling is showing us that these threads do nothing but they are consuming precious resources because we have more than 1000 {{NON_TRANSACTIONAL}} caches.
> (i) This can be considered when solving ISPN-3702 too.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4840) LockControlCommand timeouts can cause orphaned locks
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4840?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4840:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1160638|https://bugzilla.redhat.com/show_bug.cgi?id=1160638] from MODIFIED to ON_QA
> LockControlCommand timeouts can cause orphaned locks
> ----------------------------------------------------
>
> Key: ISPN-4840
> URL: https://issues.jboss.org/browse/ISPN-4840
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Transactions
> Affects Versions: 7.0.0.CR1
> Reporter: Erik Salter
> Assignee: Dan Berindei
> Fix For: 7.0.0.Final
>
>
> If an originator times out on sending a pessimistic LockControlCommand, the receiver may still get the message. In this case, the originator will send a TxCompletionCommand. Because of this, the receiver will remove the registered remote transaction from its transaction table. If, however, another thread is processing the remote LCC command, it could acquire the lock(s) after the referencing remoteTx is removed. Thus, the affected keys will remain locked indefinitely.
> A simple solution would be to add a check to see if there's a remote transaction when the LCC thread verifies the remote transaction. This would be in addition to checking if the transaction is completed.
> See the following TRACE messages:
> -- Local TX created
> 2014-10-10 11:27:28,899 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) Created a new local transaction: LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local, topologyId=64, age(ms)=0
> -- Remote TX created
> 2014-10-10 11:27:28,525 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Created and registered remote transaction RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=0
> -- Originator times out on LCC
> 2014-10-10 11:27:36,902 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) ISPN000071: Caught exception when handling command LockControlCommand{cache=eigAllocation, keys=[EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130]], flags=null, unlock=false}
> org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:188)
> ...
> at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitLockControlCommand(TxDistributionInterceptor.java:204)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
> at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:235)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
> at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:116)
> ...
> at org.infinispan.CacheImpl.lock(CacheImpl.java:565)
> at org.infinispan.CacheImpl.lock(CacheImpl.java:548)
> ...
> Caused by: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:303)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:181)
> ... 91 more
> 2014-10-10 11:27:36,936 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=false
> 2014-10-10 11:27:36,937 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) ISPN000136: Execution error
> org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)
> ...
> -- Same stack trace
> -- Remote TX removed -- must be from TxCompletionMessage since isMarkedForRollback == false
> 2014-10-10 11:27:36,530 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2963,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Removed remote transaction GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local ? RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=8004
> 2014-10-10 11:27:36,530 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2963,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Removed RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=8004 from transaction table.
> -- Local TX removed
> 2014-10-10 11:27:36,937 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) Removed LocalTransaction{remoteLockedNodes=[240-east-dht2.comcast.net-46326(CMC-Denver-CO), 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)], isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local, topologyId=64, age(ms)=8038 from transaction table.
> -- Lock acquisition completes!
> 2014-10-10 11:27:39,195 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Attempting to lock EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130] with acquisition timeout of 5000 millis
> 2014-10-10 11:27:39,198 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Successfully acquired lock EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130]!
> 2014-10-10 11:27:39,202 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Transaction=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, nodeMaxPrunedTxId=858439
> -- Not in completion map since transaction was removed without commit or rollback. Lock never releases
> 2014-10-10 11:27:39,203 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=false
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months