[JBoss JIRA] (ISPN-8861) DIST Iteration shouldn't go remote when a shared cache store is present
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-8861?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-8861:
----------------------------------
Sprint: Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> DIST Iteration shouldn't go remote when a shared cache store is present
> -----------------------------------------------------------------------
>
> Key: ISPN-8861
> URL: https://issues.jboss.org/browse/ISPN-8861
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 10.0.0.Final
>
>
> When a user runs a distributed iteration operation with a shared cache store, this shouldn't go remote as the local node would have all the data available to it. This is similar to the optimization when we don't go remote if the current node is an owner for all required segments.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9036) Parameterized branding for C++
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9036?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9036:
----------------------------------
Sprint: Sprint 9.3.0.Alpha1, Sprint 9.3.0.Beta1, Sprint 9.3.0.CR1, Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 9.3.0.Alpha1, Sprint 9.3.0.Beta1, Sprint 9.3.0.CR1, Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> Parameterized branding for C++
> ------------------------------
>
> Key: ISPN-9036
> URL: https://issues.jboss.org/browse/ISPN-9036
> Project: Infinispan
> Issue Type: Enhancement
> Reporter: Tristan Tarrant
> Assignee: Vittorio Rigamonti
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9291) BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9291?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9291:
----------------------------------
Sprint: Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-9291
> URL: https://issues.jboss.org/browse/ISPN-9291
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: testsuite_stability
> Fix For: 10.0.0.Beta4, 9.4.16.Final
>
>
> The partition handling tests use {{BasePartitionHandlingTest.Partition.installMergeView(view1, view2)}} to install the merge view without waiting for {{MERGE3}} to run, making them much faster. Unfortunately, the implementation is incorrect: {{GMS.installView(view)}} only works for regular views, merge views need to be installed with {{GMS.installView(mergeView, digest)}}.
> The result is that the nodes that got isolated from the coordinator request the retransmission of all the {{NAKACK2}} messages (including view updates) since the cluster first started. The isolated nodes cannot install the merge view until they deliver all the older messages (even without knowing whether they're OOB or not). But if {{STABLE}} ran and cleared a range of messages already, the retransmission request cannot be satisfied, so the view updates will never be delivered.
> This is easily reproducible in {{CrashedNodeDuringConflictResolutionTest}} if we add a delay before updating the topology in {{StateConsumerImpl}}. The test installs the merge view manually, but then kills NodeC and expects the cluster to install the new view automatically. NodeD can't install the new view because it's waiting for earlier messages from NodeA:
> {noformat}
> 18:27:13,054 INFO (testng-test:[]) [TestSuiteProgress] Test starting: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> 18:27:13,640 DEBUG (testng-test:[]) [GMS] test-NodeA-39513: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,674 DEBUG (testng-test:[]) [GMS] test-NodeD-59078: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,828 TRACE (jgroups-7,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((1): {50}) to test-NodeA-39513
> 18:27:13,966 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((49): {1-49}) to test-NodeA-39513
> 18:27:14,067 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:14,504 DEBUG (testng-test:[]) [DefaultCacheManager] Stopping cache manager ISPN on test-NodeC-43706
> 18:27:18,642 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: joiners=[], suspected=[test-NodeC-43706], leaving=[], new view: [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,643 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: mcasting view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,646 DEBUG (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: installing view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,652 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: sending msg to null, src=test-NodeA-39513, headers are GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=63], TP: [cluster_name=ISPN]
> 18:27:18,656 TRACE (jgroups-20,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: received [dst: test-NodeA-39513, src: test-NodeB-9439 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=100, TP: [cluster_name=ISPN]
> 18:27:20,554 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,653 WARN (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: failed to collect all ACKs (expected=2) for view [test-NodeA-39513|11] after 2000ms, missing 1 ACKs from (1) test-NodeD-59078
> 18:27:20,656 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,756 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> ...
> 18:28:14,412 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,513 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,589 ERROR (testng-test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> java.lang.RuntimeException: Cache ___defaultcache timed out waiting for rebalancing to complete on node test-NodeA-39513, current topology is CacheTopology{id=21, phase=CONFLICT_RESOLUTION, rebalanceId=7, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[test-NodeD-59078: 256+0, test-NodeA-39513: 0+256, test-NodeB-9439: 0+256]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeD-59078, test-NodeA-39513, test-NodeB-9439], persistentUUIDs=[828108c4-4251-49fc-9481-ff6392bea9fb, 1d4b6f07-b71b-41a1-adfb-abbe68944a9f, 3a1ece05-c282-433e-9eb5-7b3e0f1932aa]}. rebalanceInProgress=true, currentChIsBalanced=true
> at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:392) ~[test-classes/:?]
> at org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.performMerge(CrashedNodeDuringConflictResolutionTest.java:113) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:137) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9555) ComponentRegistry should enforce that registered components implement the interface/contract
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9555?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9555:
----------------------------------
Sprint: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> ComponentRegistry should enforce that registered components implement the interface/contract
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-9555
> URL: https://issues.jboss.org/browse/ISPN-9555
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Reporter: Nistor Adrian
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.Final
>
>
> ComponentRegistry.registerComponent(impl, interface) does not currently check that the impl really is assignable to the interface.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-7889) BaseDistributionInterceptor.remoteGet may cause concurrency issues
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-7889?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-7889:
----------------------------------
Sprint: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> BaseDistributionInterceptor.remoteGet may cause concurrency issues
> ------------------------------------------------------------------
>
> Key: ISPN-7889
> URL: https://issues.jboss.org/browse/ISPN-7889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.1.0.Alpha1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.Alpha3, 9.4.6.Final
>
>
> {{BaseDistributionInterceptor.remoteGet}} or any call that accesses the context in an async future handler that is called multiple times in parallel may lead to concurrent modifications of the context.
> These calls are usually handled using {{CompletableFuture.allOf()}} or using a CF with counter, but if one of the calls results in exceptional completion of the composed future, the processing continues (e.g. with a retry). The other parallel operation handlers are not stopped, though.
> {{BaseDistributionInterceptor.remoteGet}} shouldn't be called in parallel because it does not even synchronize regular successful invocations.
> A problem like this caused failures in {{GetAllCommandStressTest}}, and the issue was addressed for {{GetAllCommand}} in ISPN-7884.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-8889) Data race in NonTxInvocationContext
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-8889?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-8889:
----------------------------------
Sprint: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> Data race in NonTxInvocationContext
> -----------------------------------
>
> Key: ISPN-8889
> URL: https://issues.jboss.org/browse/ISPN-8889
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.1.6.Final, 9.2.0.CR2
> Environment: Java 8 (Oracle JDK 8), Solaris SPARC
> Reporter: Peter Levart
> Assignee: Dan Berindei
> Priority: Major
> Labels: data-race
> Fix For: 10.0.0.Alpha3, 9.4.6.Final
>
> Attachments: DataRacer.java, DataRacer.java
>
>
> Got the following exceptions starting up an Infinispan node while joining the cluster:
> {noformat}
> 17:10:59.012 [remote-thread--p2-t8] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PutMapCommand, writing keys [11906696627, 11906696626, 11906696625, 11906696624, 11906696631, 11906696630, 11906696629, 11906696628...<9992 other elements>]
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
> at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1832) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1949) ~[?:1.8.0_162]
> at java.util.HashMap$TreeNode.split(HashMap.java:2175) ~[?:1.8.0_162]
> at java.util.HashMap.resize(HashMap.java:714) ~[?:1.8.0_162]
> at java.util.HashMap.putVal(HashMap.java:663) ~[?:1.8.0_162]
> at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_162]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:48) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> Immediately suspected that plain HashMap is being modified from multiple threads without proper synchronization. But since I'm new to Infinispan, I can't know whether the usage of plain HashMap in NonTxInvocationContext is intentional and Infinispan code is relying on some external synchronization to be performed by other parts of system that calls NonTxInvocationContext.[lookupEntry|removeLookedUpEntry|putLookedUpEntry|getLookedUpEntries]. In any way, if this external synchronization is taking place, it has a bug and allows calling these methods concurrently without proper synchronization.
> I did some experiments to prove that and also to obtain stack traces of at least two of involved threads, but trivial things that I tried instrumenting those methods all involved some kind of synchronization and the symptom disappeared. I had to be extra smart to obtain the stack traces of two threads without inter-thread synchronization so that I could detect the data race. Here's how I instrumented NonTxInvocationContext code:
> {noformat}
> Index: core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (revision 6014e751c20daba1f00e23168281e02afb209905)
> +++ core/src/main/java/org/infinispan/context/impl/NonTxInvocationContext.java (date 1519744397000)
> @@ -22,6 +22,7 @@
> private Set<Object> lockedKeys;
> private Object lockOwner;
>
> + private final DataRacer dataRacer = new DataRacer();
>
> public NonTxInvocationContext(int numEntries, Address origin) {
> super(origin);
> @@ -35,22 +36,26 @@
>
> @Override
> public CacheEntry lookupEntry(Object k) {
> + dataRacer.detect();
> return lookedUpEntries.get(k);
> }
>
> @Override
> public void removeLookedUpEntry(Object key) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.remove(key);
> }
>
> @Override
> public void putLookedUpEntry(Object key, CacheEntry e) {
> + dataRacer.detectAndWrite();
> lookedUpEntries.put(key, e);
> }
>
> @Override
> @SuppressWarnings("unchecked")
> public Map<Object, CacheEntry> getLookedUpEntries() {
> + dataRacer.detect();
> return (Map<Object, CacheEntry>)
> (lookedUpEntries == null ?
> Collections.emptyMap() : lookedUpEntries);
> {noformat}
> Attached, you will find the DataRacer source. While running with such patched Infinispan, I got some of the following:
> {noformat}
> java.lang.IllegalThreadStateException: Thread[remote-thread--p2-t4,5,main]: data race detected with writer of state: 271f7814a414 - see suppressed for writer thread stack trace
> at org.infinispan.context.impl.DataRacer.detect(DataRacer.java:37) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.lookupEntry(NonTxInvocationContext.java:39) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.addRemoteGet(NonTxDistributionInterceptor.java:405) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleRemoteReadWriteManyCommand(NonTxDistributionInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.handleReadWriteManyCommand(NonTxDistributionInterceptor.java:303) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutMapCommand(NonTxDistributionInterceptor.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:98) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForManyEntriesCommand(EntryWrappingInterceptor.java:614) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:385) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.handleWriteManyCommand(NonTransactionalLockingInterceptor.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutMapCommand(AbstractLockingInterceptor.java:179) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutMapCommand(StateTransferInterceptor.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:154) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitPutMapCommand(DDAsyncInterceptor.java:90) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:80) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Suppressed: org.infinispan.context.impl.DataRacer$StackTrace: Thread[jgroups-13,tbd2-40757,5,main], state: 271f7814a414
> at org.infinispan.context.impl.DataRacer.write(DataRacer.java:59) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.DataRacer.detectAndWrite(DataRacer.java:49) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.context.impl.NonTxInvocationContext.putLookedUpEntry(NonTxInvocationContext.java:51) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.container.EntryFactoryImpl.wrapExternalEntry(EntryFactoryImpl.java:143) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.wrapRemoteEntry(BaseDistributionInterceptor.java:222) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.lambda$remoteGet$1(BaseDistributionInterceptor.java:192) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_162]
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_162]
> at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:66) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.StaggeredRequest.onResponse(StaggeredRequest.java:50) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1302) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1205) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1340) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.jgroups.JChannel.up(JChannel.java:819) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.Discovery.up(Discovery.java:267) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) ~[jgroups-4.0.10.Final.jar:4.0.10.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> {noformat}
> So here you have it. Either using plain HashMap in NonTxInvocationContext is wrong and ConcurrentHashMap should be used instead or some external synchronization has a bug and is inadequate. I hope this helps to fix it.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9198) Node X left the cluster - SuspectException: ISPN000400: Node X was suspected
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9198?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9198:
----------------------------------
Sprint: Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> Node X left the cluster - SuspectException: ISPN000400: Node X was suspected
> ----------------------------------------------------------------------------
>
> Key: ISPN-9198
> URL: https://issues.jboss.org/browse/ISPN-9198
> Project: Infinispan
> Issue Type: Bug
> Reporter: Diego Lovison
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: Screenshot from 2018-05-30 13-21-32.png, Screenshot from 2018-05-30 13-25-09.png, round1.zip, round2.zip
>
>
> After the commit df9ffb5ba46752d2509aa3a08c59519469cc929a in Infinispan, the tests regression-cs-hotrod-dist-reads and regression-cs-hotrod-repl-reads are failing.
> I ran 3 times the same test with the commit df9ffb5ba46752d2509aa3a08c59519469cc929a and they are working.
> If we run with master for "regression-cs-hotrod-dist-reads" it will because of:
> {noformat}
> 11:18:04,874 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> 11:22:06,470 INFO [org.radargun.Slave] (sc-main) Stage 'BasicOperationsTest' should not be executed
> 11:22:06,472 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> [0m[0m11:22:34,182 INFO [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1]
> [0m[0m11:22:34,190 INFO [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN100001: Node slave7 left the cluster
> [0m[0m11:22:48,182 INFO [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
> [0m[0m11:22:48,191 INFO [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN100001: Node slave2 left the cluster
> [0m[0m11:23:09,176 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|10] (5) [slave3, slave6, slave0, slave5, slave1]
> [0m[0m11:23:09,179 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave4 left the cluster
> [0m[0m11:23:20,173 INFO [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|11] (4) [slave6, slave0, slave5, slave1]
> [0m[0m11:23:20,178 INFO [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN100001: Node slave3 left the cluster
> [0m[33m11:23:20,199 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t60) ISPN000210: Failed to request state of cache memcachedCache from node slave3, segments {114 184 190-191}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected]
> [0m[33m11:23:23,916 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-117,slave0) JGRP000011: slave0: dropped message 319143 from non-member slave3 (view=[slave6|11] (4) [slave6, slave0, slave5, slave1])
> [0m[0m11:23:34,142 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|12] (3) [slave6, slave0, slave5]
> [0m[0m11:23:34,145 INFO [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave1 left the cluster
> [0m[33m11:23:34,154 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t61) ISPN000210: Failed to request state of cache hotrodDist from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[33m11:23:34,183 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t58) ISPN000210: Failed to request state of cache rest from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[33m11:23:34,177 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t54) ISPN000210: Failed to request state of cache memcachedCache from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
> at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
> at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> ... 3 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
> at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
> at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
> at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
> at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
> at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
> at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
> at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
> ... 3 more
> [CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
> [0m[31m11:23:34,203 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t5) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache rest. Excluded owners: []
> [0m[31m11:23:34,318 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t11) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache memcachedCache. Excluded owners: []
> [0m[33m11:23:34,332 WARN [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t61) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
> [0m[33m11:23:34,396 WARN [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t57) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
> [0m[31m11:23:34,398 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t1) ISPN000208: No live owners found for segments {71 74 146} of cache hotrodDist. Excluded owners: []
> [0m[33m11:23:34,521 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-123,slave0) JGRP000011: slave0: dropped message 338000 from non-member slave1 (view=[slave6|12] (3) [slave6, slave0, slave5])
> [0m[33m11:23:51,210 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-124,slave0) slave0: not member of view [slave6|13]; discarding it
> [0m[33m11:24:02,223 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([slave6|13]) doesn't match the current view-id ([slave6|12]); discarding delta view [slave6|14], ref-view=[slave6|13], left=[slave5]
> [0m[33m11:24:02,231 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave6|14]; discarding it
> [0m[33m11:24:11,932 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave5|15]; discarding it
> [0m[0m11:24:12,485 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN000094: Received new cluster view for channel cluster: [slave0|16] (2) [slave0, slave5]
> [0m[0m11:24:12,488 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN100001: Node slave6 left the cluster
> [0m[33m11:24:14,492 WARN [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT.TimerThread-129,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave0|16] after 2000ms, missing 1 ACKs from (1) slave5
> [0m[33m11:24:34,209 WARN [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-128,slave0) JGRP000011: slave0: dropped message 319152 from non-member slave3 (view=[slave0|16] (2) [slave0, slave5]) (received 11 identical messages from slave3 in the last 70294 ms)
> [0m[0m11:25:10,121 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN000093: Received new, MERGED cluster view for channel cluster: MergeView::[slave3|17] (7) [slave3, slave5, slave2, slave0, slave4, slave1, slave7], 6 subgroups: [slave5|15] (1) [slave5], [slave3|15] (2) [slave3, slave0], [slave0|16] (2) [slave0, slave5], [slave3|7] (8) [slave3, slave6, slave0, slave5, slave4, slave2, slave7, slave1], [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1], [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
> [0m[0m11:25:10,124 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave3 joined the cluster
> [0m[0m11:25:10,127 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave2 joined the cluster
> [0m[0m11:25:10,128 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave4 joined the cluster
> [0m[0m11:25:10,129 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave1 joined the cluster
> [0m[0m11:25:10,130 INFO [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave7 joined the cluster
> [0m[33m11:25:10,362 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=49, phase=NO_REBALANCE, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, pendingCH=null, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 57
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,374 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,394 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100008: Updating cache members list [slave5], topology id 58
> [0m[0m11:25:10,415 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,415 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,416 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,419 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
> [0m[0m11:25:10,420 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 59
> [0m[0m11:25:10,420 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 42
> [0m[33m11:25:10,419 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t70) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=47, phase=READ_ALL_WRITE_ALL, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,422 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 52
> [0m[0m11:25:10,424 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[33m11:25:10,423 WARN [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=44, phase=READ_OLD_WRITE_ALL, rebalanceId=13, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
> [0m[0m11:25:10,426 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100008: Updating cache members list [slave5], topology id 53
> [0m[0m11:25:10,426 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 49
> [0m[0m11:25:10,430 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100008: Updating cache members list [slave5], topology id 50
> [0m[0m11:25:10,431 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
> [0m[0m11:25:10,431 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 43
> [0m[0m11:25:10,435 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 54
> [0m[0m11:25:10,438 INFO [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 51
> [0m[33m11:25:12,134 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-128,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave3|17] after 2000ms, missing 1 ACKs from (1) slave5
> 11:26:11,452 INFO [org.radargun.Slave] (sc-main) Starting stage ClusterSplitVerify
> 11:26:11,453 ERROR [org.radargun.stages.monitor.ClusterSplitVerifyStage] (sc-main) Cluster size at the beginning of the test was 8 but changed to 7 during the test! Perhaps a split occured, or a new node joined?
> 11:26:11,454 INFO [org.radargun.Slave] (sc-main) Finished stage ClusterSplitVerify
> 11:26:11,455 INFO [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
> 11:26:11,571 INFO [org.radargun.Slave] (sc-main) Starting stage ScenarioDestroy
> 11:26:11,573 INFO [org.radargun.stages.ScenarioDestroyStage] (sc-main) Scenario finished, destroying...
> 11:26:11,575 INFO [org.radargun.stages.ScenarioDestroyStage] (sc-main) Memory before cleanup:
> Runtime free: 1,484,671 kb
> Runtime max:27,960,320 kb
> Runtime total:1,974,784 kb
> {noformat}
> If we run with master for "regression-cs-hotrod-dist-writes" it will because of:
> {noformat}
> 21:39:52,651 INFO [org.radargun.RemoteSlaveConnection] (main) Master started and listening for connection on: /172.18.1.18:2103
> 21:39:52,651 INFO [org.radargun.RemoteSlaveConnection] (main) Waiting 5 seconds for server socket to open completely
> 21:39:57,655 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 16 slaves.
> 21:39:57,666 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 15 slaves.
> 21:39:57,667 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 14 slaves.
> 21:39:57,668 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 13 slaves.
> 21:39:57,669 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 12 slaves.
> 21:39:57,670 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 11 slaves.
> 21:39:57,671 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 10 slaves.
> 21:39:57,672 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 9 slaves.
> 21:39:57,674 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 8 slaves.
> 21:39:57,675 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 7 slaves.
> 21:39:57,676 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 6 slaves.
> 21:39:57,677 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 5 slaves.
> 21:39:57,678 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 4 slaves.
> 21:39:57,708 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 3 slaves.
> 21:39:57,710 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 2 slaves.
> 21:39:57,711 INFO [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 1 slaves.
> 21:44:57,668 ERROR [org.radargun.Master] (main) Exception in Master.run:
> java.io.IOException: 1 slaves haven't connected within timeout!
> at org.radargun.RemoteSlaveConnection.establish(RemoteSlaveConnection.java:112) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.Master.run(Master.java:59) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> 21:44:57,697 WARN [org.radargun.RemoteSlaveConnection] (main) Failed to send termination to slaves.
> java.lang.NullPointerException: null
> at org.radargun.RemoteSlaveConnection$SlaveRecord.access$100(RemoteSlaveConnection.java:63) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.mcastBuffer(RemoteSlaveConnection.java:201) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.mcastObject(RemoteSlaveConnection.java:211) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.RemoteSlaveConnection.release(RemoteSlaveConnection.java:357) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.Master.run(Master.java:155) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
> 21:44:57,703 INFO [org.radargun.ShutDownHook] (Thread-1) Master process is being shutdown
> Master 16071 finished with value 127
> kill: sending signal to 16071 failed: No such process
> kill: sending signal to 16071 failed: No such process
> {noformat}
> The tests are related with HotRod during the reads operation in a replicated and distributed cache.
> The scenario is:
> 8 - Servers
> 8 - Slaves
> Commits:
> 13/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times
> 14/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times (I didn't executed this because it is the same of the above, just to keep the history)
> 15/May - 26ba1aeb1d66cf65fb5c410ec98629093c29ab0b - Need to double check (It passed)
> 16/May - 92a5e4f62c39d63221aed2ed5763081b626874e6 - Need to double check (It start failing here)
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9270) ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9270?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9270:
----------------------------------
Sprint: Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
> ---------------------------------------------------------------------
>
> Key: ISPN-9270
> URL: https://issues.jboss.org/browse/ISPN-9270
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.Final
>
>
> The test stops the coordinator A of a 2-node cluster \[A, B\], and then verifies that B stays available.
> Unfortunately that doesn't always happen: A tries to install topology \[B\] before shutting down, but B might receive it too late and reject it. If B expects members \[A, B\] and sees only B, the cache will enter degraded mode:
> {noformat}
> 19:04:05,863 INFO (jgroups-8,Test-NodeB-21374:[org.infinispan.LOCKS]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100008: Updating cache members list [Test-NodeD-51051], topology id 7
> 19:04:05,863 TRACE (jgroups-4,Test-NodeD-51051:[]) [JGroupsTransport] Test-NodeD-51051 received command from Test-NodeB-21374: CacheTopologyControlCommand{cache=org.infinispan.LOCKS, type=CH_UPDATE, sender=Test-NodeB-21374, joinInfo=null, topologyId=7, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[Test-NodeD-51051: 256]}, pendingCH=null, availabilityMode=AVAILABLE, phase=NO_REBALANCE, actualMembers=[Test-NodeD-51051], throwable=null, viewId=1}
> 19:04:05,880 INFO (jgroups-4,Test-NodeD-51051:[]) [CLUSTER] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeD-51051|2] (1) [Test-NodeD-51051]
> 19:04:05,884 DEBUG (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [ClusterCacheStatus] Recovered 1 partition(s) for cache org.infinispan.LOCKS: [CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeB-21374: 124, Test-NodeD-51051: 132]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-51051], persistentUUIDs=[6ac3a341-5af4-4aa2-ae7b-a4183d02d959]}]
> 19:04:05,884 WARN (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [Test-NodeD-51051], lost members are [Test-NodeB-21374], stable members are [Test-NodeB-21374, Test-NodeD-51051]
> 19:04:05,885 INFO (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100011: Entering availability mode DEGRADED_MODE, topology id 8
> 19:04:05,941 DEBUG (transport-thread-Test-NodeD-p100-t2:[Topology-org.infinispan.LOCKS]) [LocalTopologyManagerImpl] Ignoring topology 7 for cache org.infinispan.LOCKS from old coordinator Test-NodeB-21374
> 19:04:06,024 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking
> java.lang.Error: java.util.concurrent.ExecutionException: java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.lock.exception.ClusteredLockException: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=Test}' is not available. Not all owners are in this partition
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47) ~[infinispan-core-9.3.0-SNAPSHOT-tests.jar:9.3.0-SNAPSHOT]
> at org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking(ClusteredLockWith2NodesTest.java:49) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9541) Module initialization is not thread-safe
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9541?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9541:
----------------------------------
Sprint: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> Module initialization is not thread-safe
> ----------------------------------------
>
> Key: ISPN-9541
> URL: https://issues.jboss.org/browse/ISPN-9541
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.Final
>
>
> In my ISPN-9127 fix I created a {{BasicComponentRegistry}} interface that represents a mostly-read-only collection of components. It has {{replaceComponent()}} method and a {{rewire()}} method for testing purposes, but it turns out modules were also relying on the ability to replace existing components in order to work.
> Replacing global components is normally safe during the {{ModuleLifecycle.cacheManagerStarting()}}, because none of the components are started yet, so when a component starts later we can still start its dependencies first. But because some modules starts some global components, e.g. by calling {{manager.getCache(name)}}, that assumption breaks.
> The {{infinispan-server-event-logger}} module is a bit more sneaky: it doesn't replace a component, instead it replaces the actual implementation of the event logger in the {{EventLogManager}} component. Events that happen before the module's {{cacheManagerStarting()}} or after {{cacheManagerStopping()}} will be silently dropped from the persistent event log.
> I am investigating making the module a factory of factories. Instead of having a monolitic {{cacheManagerStarting()}} method, it could define a set of components that it can create, and a set of components that should be started before the cache manager is "running". We probably need a way to depend on other modules as well, maybe reusing the {{@Inject}} and {{@ComponentName}} annotations.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-9703) CI should test that distribution build works
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9703?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9703:
----------------------------------
Sprint: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32 (was: Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 10.0.0.Beta1, DataGrid Sprint #31)
> CI should test that distribution build works
> --------------------------------------------
>
> Key: ISPN-9703
> URL: https://issues.jboss.org/browse/ISPN-9703
> Project: Infinispan
> Issue Type: Enhancement
> Affects Versions: 9.4.1.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Alpha3
>
>
> Currently the CI does not test whether the `-Pdistribution` profile successfully builds or not. As we want master to always be releasable, we should test this on PRs.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months