[JBoss JIRA] (ISPN-4585) Prioritize commands in the remote executor
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4585?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4585:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> Prioritize commands in the remote executor
> ------------------------------------------
>
> Key: ISPN-4585
> URL: https://issues.jboss.org/browse/ISPN-4585
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Fix For: 7.1.0.Beta1
>
>
> The remote executor currently has an unlimited queue of blocked task, but the underlying executor cannot use a queue. With a queue, we wouldn't need to overflow remote commands to the OOB threads, and the OOB threads would be free to process response messages.
> The problem is that {{ThreadPoolExecutor}} executes tasks in the order they are in the queue. If a node has a remote executor thread pool of 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) command, the commit(tx1) command will block until all but 99 of the the prepare(tx_i, put(k, v_i)) commands have timed out.
> I think we could help this by using a {{PriorityBlockingQueue}} for the underlying executor, with commands ordered so that state transfer commands < commit/tx completion notification < prepare/lock. The commit command would still have to wait for one of the prepare commands currently running to time out, but it wouldn't have to wait for all of them.
> The current code, without a queue, would fill the remote executor and OOB thread pools, and it would discard the commit message (along with most of the prepare commands). The time it would take to process the commit successfully would depend on the timing of the retransmitted messages.
> Another possible improvement would be to keep track of the commands currently being executed, and always keep some threads free for commands with higher priority. But I'm not sure how easy it would be to do that on top of an injected {{ExecutorService}}.
> I believe there is also a problem with {{BlockingTaskAwareExecutorServiceImpl.checkForReadyTasks()}} after a topology change. Commands with the new topology id are all unblocked by submitting them to the underlying executor in FIFO order, on a single thread, so {{CallerRunsPolicy}} is not a valid rejection policy here.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4572) StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4572?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4572:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-4572
> URL: https://issues.jboss.org/browse/ISPN-4572
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer, Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Beta1
>
>
> {noformat}
> java.lang.AssertionError:
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:274)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.doWritingCacheTest(StateTransferReplicationQueueTest.java:144)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx(StateTransferReplicationQueueTest.java:88)
> {noformat}
> No trace log available for now.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4568?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4568:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.jboss.org/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Beta1
>
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4566) ManualIndexingTest.testManualIndexing random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4566?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4566:
------------------------------
Fix Version/s: 7.1.0.Beta1
(was: 7.1.0.Alpha1)
> ManualIndexingTest.testManualIndexing random failures
> -----------------------------------------------------
>
> Key: ISPN-4566
> URL: https://issues.jboss.org/browse/ISPN-4566
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.1.0.Beta1
>
>
> Random timeouts when TRACE logging is enabled:
> {noformat}
> 04:58:33,679 ERROR (testng-ManualIndexingTest:) [UnitTestTestNGListener] Test testManualIndexing(org.infinispan.query.api.ManualIndexingTest) failed.
> org.infinispan.commons.CacheException: java.util.concurrent.ExecutionException: Map phase executing at ManualIndexingTest-NodeA-44176 did not complete within 20 sec timeout
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeHelper(MapReduceTask.java:506)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:407)
> at org.infinispan.query.impl.massindex.MapReduceMassIndexer.start(MapReduceMassIndexer.java:25)
> at org.infinispan.query.api.ManualIndexingTest.testManualIndexing(ManualIndexingTest.java:52)
> {noformat}
> Trace log here: http://ci.infinispan.org/viewLog.html?buildId=9816&buildTypeId=Infinispan...
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months