[JBoss JIRA] (ISPN-6494) Investigate bundler performance
by Dan Berindei (Jira)
[ https://issues.jboss.org/browse/ISPN-6494?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-6494:
------------------------------------
I have started working on a new bundler implementation tentatively named {{BlockingBundler}}, with the goal of blocking the application threads whenever the bundler thread can't keep up as a form of back-pressure.
In the initial implementation, the message serialization can be performed either by the bundler thread, if the throughput is low, or by an application thread, if the bundler thread is falling behind. This means serialization is still performed in one go, while other application threads are blocked, so I would like to serialize messages when an application thread adds them instead. But because the format of a message by itself and in a bundle differs, this change would require us to send single messages as 1-message bundles, which is a bit expensive without some optimizations on the receive side.
In a future iteration I would also like to set a limit on how much an application thread can be blocked in the bundler, and discard the message when that happens. I need to investigate how that would interact with {{TCP_NIO2}}, which also seems to drop messages when the channel is not writable. [~belaban] do you have any plans to implement some sort of back-pressure/blocking in {{TCP_NIO2}} for 5.0, so that the bundler doesn't serialize bundles if they're going to be discarded by {{TCP_NIO2}}?
> Investigate bundler performance
> -------------------------------
>
> Key: ISPN-6494
> URL: https://issues.jboss.org/browse/ISPN-6494
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 9.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
>
> For ISPN-6027 we changed the default JGroups bundler to {{sender-sends-with-timer}}, because it was faster in some of the performance tests. However, IspnPerfTest shows {{transfer-queue-bundler}} to be consistently better, so we need to investigate the bundler choice again.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9475) AbstractInfinispanTest mistakenly detects some test with params as duplicates
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9475?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9475:
--------------------------------
Labels: on-hold (was: )
> AbstractInfinispanTest mistakenly detects some test with params as duplicates
> -----------------------------------------------------------------------------
>
> Key: ISPN-9475
> URL: https://issues.jboss.org/browse/ISPN-9475
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.4.0.CR1
> Reporter: Adrian Nistor
> Assignee: Adrian Nistor
> Priority: Major
> Labels: on-hold
> Fix For: 9.4.0.CR3, 9.3.3.Final
>
>
> Tests with undeclared params will not have the test name properly generated, so AbstractInfinispanTest will see all tests created by a factory as having the same name and will try to mark them as failed. Unfortunately the exception thrown in the method interceptor does not fail the test, it just manages to get it ignored.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9332) REPL local iteration optimization cannot be used when store has write behind
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9332?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9332:
--------------------------------
Tester: Diego Lovison
> REPL local iteration optimization cannot be used when store has write behind
> ----------------------------------------------------------------------------
>
> Key: ISPN-9332
> URL: https://issues.jboss.org/browse/ISPN-9332
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores, Streams
> Affects Versions: 9.3.0.Final
> Reporter: William Burns
> Assignee: William Burns
> Priority: Major
> Fix For: 9.4.0.CR3
>
>
> When write behind is enabled, the write modification is stored on the primary owner. REPL local iteration can read from non owned data, thus causing an inconsistency.
> Thus distributed streams should always go remote when not all segments are primarily owned on a given node when write behind is enabled.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9512) *TxPartitionAndMerge*Test tests hang during teardown
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9512?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9512:
--------------------------------
Labels: on-hold testsuite_stability (was: testsuite_stability)
> *TxPartitionAndMerge*Test tests hang during teardown
> ----------------------------------------------------
>
> Key: ISPN-9512
> URL: https://issues.jboss.org/browse/ISPN-9512
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: on-hold, testsuite_stability
> Fix For: 9.4.0.CR3
>
> Attachments: master_20180913-1119_PessimisticTxPartitionAndMergeDuringRollbackTest-infinispan-core.log.gz, threaddump-org_infinispan_partitionhandling_PessimisticTxPartitionAndMergeDuringRollbackTest_clearContent-2018-09-13-13828.log
>
>
> Not sure what changed recently, but the thread dumps show a state transfer executor thread blocked waiting for a clustered listeners response. The stack includes two instances of {{ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution()}}, which suggests that at some point all the state transfer executor threads (6) and async transport threads (4) were busy, and the transport thread pool queue (10) was also full.
> {noformat}
> "stateTransferExecutor-thread-PessimisticTxPartitionAndMergeDuringRollbackTest-NodeC-p57758-t1" #192601 daemon prio=5 os_prio=0 tid=0x00007f7094031800 nid=0x5b27 waiting on condition [0x00007f70190ce000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000d470b0f8> (a java.util.concurrent.CompletableFuture$Signaller)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
> at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:262)
> at org.infinispan.statetransfer.StateConsumerImpl.getClusterListeners(StateConsumerImpl.java:895)
> at org.infinispan.statetransfer.StateConsumerImpl.fetchClusterListeners(StateConsumerImpl.java:453)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:309)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:197)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:54)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:117)
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:517)
> - locked <0x00000000cc304f88> (a org.infinispan.topology.LocalCacheStatus)
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:475)
> at org.infinispan.topology.LocalTopologyManagerImpl$$Lambda$429/1368424830.run(Unknown Source)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at org.infinispan.executors.LazyInitializingExecutorService.execute(LazyInitializingExecutorService.java:121)
> at org.infinispan.executors.LimitedExecutor.tryExecute(LimitedExecutor.java:151)
> at org.infinispan.executors.LimitedExecutor.executeInternal(LimitedExecutor.java:118)
> at org.infinispan.executors.LimitedExecutor.execute(LimitedExecutor.java:108)
> at org.infinispan.topology.LocalTopologyManagerImpl.handleRebalance(LocalTopologyManagerImpl.java:473)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:199)
> at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:160)
> at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44)
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$executeOnClusterAsync$5(ClusterTopologyManagerImpl.java:600)
> at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$304/909965247.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
> at org.infinispan.executors.LazyInitializingExecutorService.submit(LazyInitializingExecutorService.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterAsync(ClusterTopologyManagerImpl.java:596)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:437)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:903)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:140)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.updateMembersAndRebalance(PreferConsistencyStrategy.java:299)
> at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.onPartitionMerge(PreferConsistencyStrategy.java:245)
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:642)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:494)
> at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$578/46555845.run(Unknown Source)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> All partition and merge tests seem to be affected: PessimisticTxPartitionAndMergeDuringPrepareTest, PessimisticTxPartitionAndMergeDuringRollbackTest, PessimisticTxPartitionAndMergeDuringRuntimeTest, OptimisticTxPartitionAndMergeDuringCommitTest, OptimisticTxPartitionAndMergeDuringPrepareTest, and OptimisticTxPartitionAndMergeDuringRollbackTest.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9453) Document FORCE_WRITE_LOCK not working in non-tx caches
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9453?page=com.atlassian.jira.plugin.... ]
Diego Lovison closed ISPN-9453.
-------------------------------
> Document FORCE_WRITE_LOCK not working in non-tx caches
> ------------------------------------------------------
>
> Key: ISPN-9453
> URL: https://issues.jboss.org/browse/ISPN-9453
> Project: Infinispan
> Issue Type: Task
> Components: Documentation-Core
> Affects Versions: 9.3.1.Final
> Reporter: Radim Vansa
> Assignee: Don Naro
> Priority: Major
> Fix For: 9.4.0.CR3
>
>
> The user guide mentions {{Flag.FORCE_WRITE_LOCK}} in the context of pessimistic transactions but it does not say what this does in non-transactional caches.
> In non-tx cache it does not work since when on backup owner, the {{get()}} just reads the value. It does not go to primary to lock there. When on non-owner, it reads from some other owner but not necessarily from the primary. The flag does not force the command to acquire locks remotely.
> To put it in other words, it works only scarcely = it does not work.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-6494) Investigate bundler performance
by Dan Berindei (Jira)
[ https://issues.jboss.org/browse/ISPN-6494?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-6494:
------------------------------------
I have run a few PerfAck jobs with different bundlers and the trend is clear: tests with 100 threads are significantly faster, but tests with 10 threads are slightly slower. The trend is somewhat visible with 10kb values (throughput is down 2-6% in the tests w/ 10 threads), but much more obvious in the test with 1kb values (throughput is down 7-15% in the tests w/ 10 threads).
In fairness, the reference benchmark is pretty old, so there may be other factors: http://perfrepo.mw.lab.eng.bos.redhat.com/reports/tableComparisonReport/1...
> Investigate bundler performance
> -------------------------------
>
> Key: ISPN-6494
> URL: https://issues.jboss.org/browse/ISPN-6494
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 9.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
>
> For ISPN-6027 we changed the default JGroups bundler to {{sender-sends-with-timer}}, because it was faster in some of the performance tests. However, IspnPerfTest shows {{transfer-queue-bundler}} to be consistently better, so we need to investigate the bundler choice again.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9453) Document FORCE_WRITE_LOCK not working in non-tx caches
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9453?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9453:
--------------------------------
Tester: Diego Lovison
> Document FORCE_WRITE_LOCK not working in non-tx caches
> ------------------------------------------------------
>
> Key: ISPN-9453
> URL: https://issues.jboss.org/browse/ISPN-9453
> Project: Infinispan
> Issue Type: Task
> Components: Documentation-Core
> Affects Versions: 9.3.1.Final
> Reporter: Radim Vansa
> Assignee: Don Naro
> Priority: Major
> Fix For: 9.4.0.CR3
>
>
> The user guide mentions {{Flag.FORCE_WRITE_LOCK}} in the context of pessimistic transactions but it does not say what this does in non-transactional caches.
> In non-tx cache it does not work since when on backup owner, the {{get()}} just reads the value. It does not go to primary to lock there. When on non-owner, it reads from some other owner but not necessarily from the primary. The flag does not force the command to acquire locks remotely.
> To put it in other words, it works only scarcely = it does not work.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months