[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4568?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4568:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.jboss.org/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Alpha1
>
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4566) ManualIndexingTest.testManualIndexing random failures
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4566?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4566:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> ManualIndexingTest.testManualIndexing random failures
> -----------------------------------------------------
>
> Key: ISPN-4566
> URL: https://issues.jboss.org/browse/ISPN-4566
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Alpha1
>
>
> Random timeouts when TRACE logging is enabled:
> {noformat}
> 04:58:33,679 ERROR (testng-ManualIndexingTest:) [UnitTestTestNGListener] Test testManualIndexing(org.infinispan.query.api.ManualIndexingTest) failed.
> org.infinispan.commons.CacheException: java.util.concurrent.ExecutionException: Map phase executing at ManualIndexingTest-NodeA-44176 did not complete within 20 sec timeout
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeHelper(MapReduceTask.java:506)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:407)
> at org.infinispan.query.impl.massindex.MapReduceMassIndexer.start(MapReduceMassIndexer.java:25)
> at org.infinispan.query.api.ManualIndexingTest.testManualIndexing(ManualIndexingTest.java:52)
> {noformat}
> Trace log here: http://ci.infinispan.org/viewLog.html?buildId=9816&buildTypeId=Infinispan...
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4585) Prioritize commands in the remote executor
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4585?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4585:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> Prioritize commands in the remote executor
> ------------------------------------------
>
> Key: ISPN-4585
> URL: https://issues.jboss.org/browse/ISPN-4585
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Fix For: 7.1.0.Alpha1
>
>
> The remote executor currently has an unlimited queue of blocked task, but the underlying executor cannot use a queue. With a queue, we wouldn't need to overflow remote commands to the OOB threads, and the OOB threads would be free to process response messages.
> The problem is that {{ThreadPoolExecutor}} executes tasks in the order they are in the queue. If a node has a remote executor thread pool of 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) command, the commit(tx1) command will block until all but 99 of the the prepare(tx_i, put(k, v_i)) commands have timed out.
> I think we could help this by using a {{PriorityBlockingQueue}} for the underlying executor, with commands ordered so that state transfer commands < commit/tx completion notification < prepare/lock. The commit command would still have to wait for one of the prepare commands currently running to time out, but it wouldn't have to wait for all of them.
> The current code, without a queue, would fill the remote executor and OOB thread pools, and it would discard the commit message (along with most of the prepare commands). The time it would take to process the commit successfully would depend on the timing of the retransmitted messages.
> Another possible improvement would be to keep track of the commands currently being executed, and always keep some threads free for commands with higher priority. But I'm not sure how easy it would be to do that on top of an injected {{ExecutorService}}.
> I believe there is also a problem with {{BlockingTaskAwareExecutorServiceImpl.checkForReadyTasks()}} after a topology change. Commands with the new topology id are all unblocked by submitting them to the underlying executor in FIFO order, on a single thread, so {{CallerRunsPolicy}} is not a valid rejection policy here.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4572) StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4572?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4572:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-4572
> URL: https://issues.jboss.org/browse/ISPN-4572
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer, Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Alpha1
>
>
> {noformat}
> java.lang.AssertionError:
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:274)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.doWritingCacheTest(StateTransferReplicationQueueTest.java:144)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx(StateTransferReplicationQueueTest.java:88)
> {noformat}
> No trace log available for now.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4586) Too many OutdatedTopologyExceptions in non-transactional caches
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4586?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4586:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> Too many OutdatedTopologyExceptions in non-transactional caches
> ---------------------------------------------------------------
>
> Key: ISPN-4586
> URL: https://issues.jboss.org/browse/ISPN-4586
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Labels: performance
> Fix For: 7.1.0.Alpha1
>
>
> In a non-tx cache, when the topology id is incremented, owners (both primary and backup) receiving a write command with a lower topology id throw an OutdatedTopologyException so that the originator retries the command on the new owners.
> But the originator needs to retry the command only if the owners of the key changed in any way. During a join or a leave, most of the keys should not change owners, so throwing an OutdatedTopologyException is not necessary most of the time.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4587) Re-add old owners in the pending CH when a node leaves during rebalance
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4587?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4587:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> Re-add old owners in the pending CH when a node leaves during rebalance
> -----------------------------------------------------------------------
>
> Key: ISPN-4587
> URL: https://issues.jboss.org/browse/ISPN-4587
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Minor
> Fix For: 7.1.0.Alpha1
>
>
> Say we have a distributed cache \[A, B\] with {{numSegments = 1}} and {{numOwners = 2}}. The initial topology is _T_: currentCH = \{0: A B\}, pendingCH = null
> C joins, and A starts a rebalance. The topology is now _T + 1_: currentCH = \{0: A B\}, pendingCH = \{0: A C\}
> C now leaves, A updates the consistent hashes to remove it with a new topology _T + 2: currentCH = \{0: A B\}, pendingCH = \{0: A\}
> A doesn't need to receive any data, so the rebalance ends and the pending CH is installed as the current CH in topology _T + 3_: currentCH = \{0: A\}, pendingCH = null
> This algorithm is relatively easy to follow and implement, but it does result in reduced availability of the cache data. It would be better if topology _T + 2_ could re-add B as an owner in the pending CH.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month
[JBoss JIRA] (ISPN-4610) Implement total order for non-transactional caches
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4610?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-4610:
-----------------------------------
Fix Version/s: 7.1.0.Alpha1
(was: 7.0.0.Final)
> Implement total order for non-transactional caches
> --------------------------------------------------
>
> Key: ISPN-4610
> URL: https://issues.jboss.org/browse/ISPN-4610
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Fix For: 7.1.0.Alpha1
>
>
> Current locking algorithm in non-transactional caches needs a remote thread on the primary owner to block while replicating the update to the backup owner. The thread is also holding the lock for the key, so it's blocking other threads that want to write to the same key. When there is a lot of contention, this can exhaust the remote executor thread pool and cause lock timeouts.
> TO was designed with high contention in mind, and it doesn't block threads to acquire locks. So it should handle this much better.
> An alternative solution would be the locking rework in ISPN-2849.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
10 years, 1 month