[JBoss JIRA] (ISPN-9852) Invocations waiting for a new topology should resume in parallel
by Dan Berindei (Jira)
Dan Berindei created ISPN-9852:
----------------------------------
Summary: Invocations waiting for a new topology should resume in parallel
Key: ISPN-9852
URL: https://issues.jboss.org/browse/ISPN-9852
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.4.5.Final, 10.0.0.Alpha2
Reporter: Dan Berindei
Fix For: 10.0.0.Beta2
When a command requires a topology newer than the current topology, it uses {{StateTransferLock.transactionDataFuture()}} to wait for the newer topology. When the tx data is received for the newer topology, some (most?) of the waiting operations do not use a separate executor, instead they are all resumed on the thread that received the tx data. If some operations block (e.g. because of a store), it could take a very long time to finish all the blocked operations.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 12 months
[JBoss JIRA] (ISPN-9846) Ensure Backwards Compatibility with Persistence SPI changes
by Dan Berindei (Jira)
[ https://issues.jboss.org/browse/ISPN-9846?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9846:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Ensure Backwards Compatibility with Persistence SPI changes
> -----------------------------------------------------------
>
> Key: ISPN-9846
> URL: https://issues.jboss.org/browse/ISPN-9846
> Project: Infinispan
> Issue Type: Sub-task
> Components: Loaders and Stores
> Affects Versions: 10.0.0.Alpha2
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Beta1
>
>
> ISPN-9693 deprecated the {{org.infinispan.marshall.core.MarshalledEntry}} interface in favour of {{org.infinispan.persistence.spi.MarshalledEntry}}, however this breaks compatibility with 9.4.x stores as the method signatures of the various store/loader interfaces changed.
> Therefore, to aid with enabling backwards compatibility it is necessary to change {{spi.MarshalledEntry}} to {{spi.MarshallableEntry}} as well as the corresponding factory. This is necessary as the InitializationContext previously exposed {{marshall.core.MarshalledEntry}} and the naming change provides us an easy way to differentiate the context methods as we can't have two getMarshalledEntryFactory methods due to type erasure.
> Due to type erasure it is also necessary to introduce several new methods to the writer/loader methods. Deprecated/Replacement method list below:
> * CacheLoader MarshalledEntry load(Object) -> MarshallableEntry loadEntry(Object)
> * AdvancedCacheLoader Publisher<MarshalledEntry<K, V>> publishEntries(..) -> Publisher<MarshallableEntry<K, V>> entryPublisher(...)
> * CacheWriter writeBatch(Iterable<MarshalledEntry>) -> writeBulk(Publisher)- InitializationContextgetMarshalledEntryFactory->getMarshallableEntryFactory`
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 12 months
[JBoss JIRA] (ISPN-9198) Node X left the cluster - SuspectException: ISPN000400: Node X was suspected
by Dan Berindei (Jira)
[ https://issues.jboss.org/browse/ISPN-9198?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-9198:
------------------------------------
I still don't know why it happens as the JFR recordings don't show any GC or other kind of pause when the nodes are suspected.
I have a plan to add more logging to the FD_ALL and VERIFY_SUSPECT protocols in JGroups so that we can run the test with the default logging configuration (WARN only) and still get some usable information when a node is suspected.
> Node X left the cluster - SuspectException: ISPN000400: Node X was suspected
> ----------------------------------------------------------------------------
>
> Key: ISPN-9198
> URL: https://issues.jboss.org/browse/ISPN-9198
> Project: Infinispan
> Issue Type: Bug
> Reporter: Diego Lovison
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: Screenshot from 2018-05-30 13-21-32.png, Screenshot from 2018-05-30 13-25-09.png, round1.zip, round2.zip
>
>
> After the commit df9ffb5ba46752d2509aa3a08c59519469cc929a in Infinispan, the tests regression-cs-hotrod-dist-reads and regression-cs-hotrod-repl-reads are failing.
> I ran 3 times the same test with the commit df9ffb5ba46752d2509aa3a08c59519469cc929a and they are working.
> If we run with master for "regression-cs-hotrod-dist-reads" it will because of:
> {noformat}
> 11:18:04,874 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> 11:22:06,470 INFO [org.radargun.Slave] (sc-main) Stage 'BasicOperationsTest' should not be executed
> 11:22:06,472 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> [0m[0m11:22:34,182 INFO [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1]
> [0m[0m11:22:34,190 INFO [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN100001: Node slave7 left the cluster
> [0m[0m11:22:48,182 INFO [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
> [0m[0m11:22:48,191 INFO [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN100001: Node slave2 left the cluster
> [0m[0m11:23:09,176 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|10] (5) [slave3, slave6, slave0, slave5, slave1]
> [0m[0m11:23:09,179 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave4 left the cluster
> [0m[0m11:23:20,173 INFO [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|11] (4) [slave6, slave0, slave5, slave1]
> [0m[0m11:23:20,178 INFO [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN100001: Node slave3 left the cluster
> [0m[33m11:23:20,199 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t60) ISPN000210: Failed to request state of cache memcachedCache from node slave3, segments {114 184 190-191}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected]
> [0m[33m11:23:23,916 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-117,slave0) JGRP000011: slave0: dropped message 319143 from non-member slave3 (view=[slave6|11] (4) [slave6, slave0, slave5, slave1])
> [0m[0m11:23:34,142 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|12] (3) [slave6, slave0, slave5]
> [0m[0m11:23:34,145 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave1 left the cluster
> [0m[33m11:23:34,154 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t61) ISPN000210: Failed to request state of cache hotrodDist from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[33m11:23:34,183 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t58) ISPN000210: Failed to request state of cache rest from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[33m11:23:34,177 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t54) ISPN000210: Failed to request state of cache memcachedCache from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[31m11:23:34,203 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t5) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache rest. Excluded owners: []
> [0m[31m11:23:34,318 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t11) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache memcachedCache. Excluded owners: []
> [0m[33m11:23:34,332 WARN [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t61) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
> [0m[33m11:23:34,396 WARN [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t57) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
> [0m[31m11:23:34,398 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t1) ISPN000208: No live owners found for segments {71 74 146} of cache hotrodDist. Excluded owners: []
> [0m[33m11:23:34,521 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-123,slave0) JGRP000011: slave0: dropped message 338000 from non-member slave1 (view=[slave6|12] (3) [slave6, slave0, slave5])
> [0m[33m11:23:51,210 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-124,slave0) slave0: not member of view [slave6|13]; discarding it
> [0m[33m11:24:02,223 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([slave6|13]) doesn't match the current view-id ([slave6|12]); discarding delta view [slave6|14], ref-view=[slave6|13], left=[slave5]
> [0m[33m11:24:02,231 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave6|14]; discarding it
> [0m[33m11:24:11,932 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave5|15]; discarding it
> [0m[0m11:24:12,485 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN000094: Received new cluster view for channel cluster: [slave0|16] (2) [slave0, slave5]
> [0m[0m11:24:12,488 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN100001: Node slave6 left the cluster
> [0m[33m11:24:14,492 WARN [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT.TimerThread-129,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave0|16] after 2000ms, missing 1 ACKs from (1) slave5
> [0m[33m11:24:34,209 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-128,slave0) JGRP000011: slave0: dropped message 319152 from non-member slave3 (view=[slave0|16] (2) [slave0, slave5]) (received 11 identical messages from slave3 in the last 70294 ms)
> [0m[0m11:25:10,121 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN000093: Received new, MERGED cluster view for channel cluster: MergeView::[slave3|17] (7) [slave3, slave5, slave2, slave0, slave4, slave1, slave7], 6 subgroups: [slave5|15] (1) [slave5], [slave3|15] (2) [slave3, slave0], [slave0|16] (2) [slave0, slave5], [slave3|7] (8) [slave3, slave6, slave0, slave5, slave4, slave2, slave7, slave1], [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1], [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
> [0m[0m11:25:10,124 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave3 joined the cluster
> [0m[0m11:25:10,127 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave2 joined the cluster
> [0m[0m11:25:10,128 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave4 joined the cluster
> [0m[0m11:25:10,129 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave1 joined the cluster
> [0m[0m11:25:10,130 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave7 joined the cluster
> [0m[33m11:25:10,362 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=49, phase=NO_REBALANCE, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, pendingCH=null, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 57
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,394 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100008: Updating cache members list [slave5], topology id 58
> [0m[0m11:25:10,415 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,415 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,416 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,419 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,420 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 59
> [0m[0m11:25:10,420 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 42
> [0m[33m11:25:10,419 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t70) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=47, phase=READ_ALL_WRITE_ALL, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,422 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 52
> [0m[0m11:25:10,424 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[33m11:25:10,423 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=44, phase=READ_OLD_WRITE_ALL, rebalanceId=13, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,426 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100008: Updating cache members list [slave5], topology id 53
> [0m[0m11:25:10,426 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 49
> [0m[0m11:25:10,430 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100008: Updating cache members list [slave5], topology id 50
> [0m[0m11:25:10,431 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,431 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 43
> [0m[0m11:25:10,435 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 54
> [0m[0m11:25:10,438 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 51
> [0m[33m11:25:12,134 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-128,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave3|17] after 2000ms, missing 1 ACKs from (1) slave5
> 11:26:11,452 INFO [org.radargun.Slave] (sc-main) Starting stage ClusterSplitVerify
> 11:26:11,453 ERROR [org.radargun.stages.monitor.ClusterSplitVerifyStage] (sc-main) Cluster size at the beginning of the test was 8 but changed to 7 during the test! Perhaps a split occured, or a new node joined?
> 11:26:11,454 INFO [org.radargun.Slave] (sc-main) Finished stage ClusterSplitVerify
> 11:26:11,455 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> 11:26:11,571 INFO [org.radargun.Slave] (sc-main) Starting stage ScenarioDestroy
> 11:26:11,573 INFO [org.radargun.stages.ScenarioDestroyStage] (sc-main) Scenario finished, destroying...
> 11:26:11,575 INFO [org.radargun.stages.ScenarioDestroyStage] (sc-main) Memory before cleanup:
> Runtime free: 1,484,671 kb
> Runtime max:27,960,320 kb
> Runtime total:1,974,784 kb
> {noformat}
> If we run with master for "regression-cs-hotrod-dist-writes" it will because of:
> {noformat}
> 21:39:52,651 INFO [org.radargun.RemoteSlaveConnection] (main) Master started and listening for connection on: /172.18.1.18:2103
> 21:39:52,651 INFO [org.radargun.RemoteSlaveConnection] (main) Waiting 5 seconds for server socket to open completely
> 21:39:57,655 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 16 slaves.
> 21:39:57,666 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 15 slaves.
> 21:39:57,667 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 14 slaves.
> 21:39:57,668 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 13 slaves.
> 21:39:57,669 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 12 slaves.
> 21:39:57,670 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 11 slaves.
> 21:39:57,671 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 10 slaves.
> 21:39:57,672 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 9 slaves.
> 21:39:57,674 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 8 slaves.
> 21:39:57,675 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 7 slaves.
> 21:39:57,676 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 6 slaves.
> 21:39:57,677 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 5 slaves.
> 21:39:57,678 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 4 slaves.
> 21:39:57,708 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 3 slaves.
> 21:39:57,710 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 2 slaves.
> 21:39:57,711 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 1 slaves.
> 21:44:57,668 ERROR [org.radargun.Master] (main) Exception in Master.run:
> java.io.IOException: 1 slaves haven't connected within timeout!
> at org.radargun.RemoteSlaveConnection.establish(RemoteSlaveConnection.java:112) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.Master.run(Master.java:59) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> 21:44:57,697 WARN [org.radargun.RemoteSlaveConnection] (main) Failed to send termination to slaves.
> java.lang.NullPointerException: null
> at org.radargun.RemoteSlaveConnection$SlaveRecord.access$100(RemoteSlaveConnection.java:63) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.mcastBuffer(RemoteSlaveConnection.java:201) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.mcastObject(RemoteSlaveConnection.java:211) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.release(RemoteSlaveConnection.java:357) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.Master.run(Master.java:155) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> 21:44:57,703 INFO [org.radargun.ShutDownHook] (Thread-1) Master process is being shutdown
> Master 16071 finished with value 127
> kill: sending signal to 16071 failed: No such process
> kill: sending signal to 16071 failed: No such process
> {noformat}
> The tests are related with HotRod during the reads operation in a replicated and distributed cache.
> The scenario is:
> 8 - Servers
> 8 - Slaves
> Commits:
> 13/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times
> 14/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times (I didn't executed this because it is the same of the above, just to keep the history)
> 15/May - 26ba1aeb1d66cf65fb5c410ec98629093c29ab0b - Need to double check (It passed)
> 16/May - 92a5e4f62c39d63221aed2ed5763081b626874e6 - Need to double check (It start failing here)
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 12 months