[infinispan-issues] [JBoss JIRA] (ISPN-9512) *TxPartitionAndMerge*Test tests hang during teardown
Dan Berindei (JIRA)
issues at jboss.org
Thu Sep 13 06:57:00 EDT 2018
[ https://issues.jboss.org/browse/ISPN-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632864#comment-13632864 ]
Dan Berindei commented on ISPN-9512:
------------------------------------
Oops, I reduced the number of transport threads and the size of the queue in ISPN-9067 (https://github.com/infinispan/infinispan/commit/ef9735d778a9e687f272b8db324c7760ccf655a6), and that's what's causing the tests to hang.
I see 2 problems that we should tackle here:
# {{ClusterTopologyManagerImpl.executeOnClusterAsync()}} invokes the command on the local node before sending it to the other nodes. This probably made sense back when the remote invocation could only be synchronous, but now it would be better to send the command first, then execute on the local node, and then wait for responses.
# {{ClusterCacheStatus.doMergePartitions()}} should not hold the {{ClusterCacheStatus}} monitor while broadcasting the rebalance command, because there's always a chance that the local invocation will happen on the caller thread.
Going further we need to move away from blocking RPCs both in {{ClusterTopologyManagerImpl}} (ISPN-8955) and in {{StateConsumerImpl}}, and also think about unifying the thread pools but using {{LimitedExecutor}} e.g. to apply state.
> *TxPartitionAndMerge*Test tests hang during teardown
> ----------------------------------------------------
>
> Key: ISPN-9512
> URL: https://issues.jboss.org/browse/ISPN-9512
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.4.0.CR3
>
> Attachments: master_20180913-1119_PessimisticTxPartitionAndMergeDuringRollbackTest-infinispan-core.log.gz, threaddump-org_infinispan_partitionhandling_PessimisticTxPartitionAndMergeDuringRollbackTest_clearContent-2018-09-13-13828.log
>
>
> Not sure what changed recently, but the thread dumps show a state transfer executor thread blocked waiting for a clustered listeners response. The stack includes two instances of {{ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution()}}, which suggests that at some point all the state transfer executor threads (6) and async transport threads (4) were busy, and the transport thread pool queue (10) was also full.
> {noformat}
> "stateTransferExecutor-thread-PessimisticTxPartitionAndMergeDuringRollbackTest-NodeC-p57758-t1" #192601 daemon prio=5 os_prio=0 tid=0x00007f7094031800 nid=0x5b27 waiting on condition [0x00007f70190ce000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000d470b0f8> (a java.util.concurrent.CompletableFuture$Signaller)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
> at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:262)
> at org.infinispan.statetransfer.StateConsumerImpl.getClusterListeners(StateConsumerImpl.java:895)
> at org.infinispan.statetransfer.StateConsumerImpl.fetchClusterListeners(StateConsumerImpl.java:453)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:309)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:197)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:54)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:117)
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:517)
> - locked <0x00000000cc304f88> (a org.infinispan.topology.LocalCacheStatus)
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:475)
> at org.infinispan.topology.LocalTopologyManagerImpl$$Lambda$429/1368424830.run(Unknown Source)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at org.infinispan.executors.LazyInitializingExecutorService.execute(LazyInitializingExecutorService.java:121)
> at org.infinispan.executors.LimitedExecutor.tryExecute(LimitedExecutor.java:151)
> at org.infinispan.executors.LimitedExecutor.executeInternal(LimitedExecutor.java:118)
> at org.infinispan.executors.LimitedExecutor.execute(LimitedExecutor.java:108)
> at org.infinispan.topology.LocalTopologyManagerImpl.handleRebalance(LocalTopologyManagerImpl.java:473)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:199)
> at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:160)
> at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44)
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$executeOnClusterAsync$5(ClusterTopologyManagerImpl.java:600)
> at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$304/909965247.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
> at org.infinispan.executors.LazyInitializingExecutorService.submit(LazyInitializingExecutorService.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterAsync(ClusterTopologyManagerImpl.java:596)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:437)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:903)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:140)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.updateMembersAndRebalance(PreferConsistencyStrategy.java:299)
> at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.onPartitionMerge(PreferConsistencyStrategy.java:245)
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:642)
> - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus)
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:494)
> at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$578/46555845.run(Unknown Source)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> All partition and merge tests seem to be affected: PessimisticTxPartitionAndMergeDuringPrepareTest, PessimisticTxPartitionAndMergeDuringRollbackTest, PessimisticTxPartitionAndMergeDuringRuntimeTest, OptimisticTxPartitionAndMergeDuringCommitTest, OptimisticTxPartitionAndMergeDuringPrepareTest, and OptimisticTxPartitionAndMergeDuringRollbackTest.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the infinispan-issues
mailing list