[JBoss JIRA] (ISPN-8859) DistTopologyChangeUnderLoadSingleOwnerTest takes too long
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8859?page=com.atlassian.jira.plugin.... ]
Radim Vansa updated ISPN-8859:
------------------------------
Status: Open (was: Pull Request Sent)
> DistTopologyChangeUnderLoadSingleOwnerTest takes too long
> ---------------------------------------------------------
>
> Key: ISPN-8859
> URL: https://issues.jboss.org/browse/ISPN-8859
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> The test starts 2 nodes (A and B), kills A and starts C. Test cleanup stops C and gets stuck in DCM.undefineConfiguration as the method is iterating through caches and finds out that {{___counter_configuration}} is not started yet, waiting for initial transfer.
> The node waits for initial ST because by the time C starts the cache is degraded on B and initial transfer cannot continue.
> Proper fix would be letting the cache start but operations throwing AvailabilityException, but workaround would be just keeping wired unstarted caches in another collection directly for the purpose of undefineConfiguration.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-8859) DistTopologyChangeUnderLoadSingleOwnerTest takes too long
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8859?page=com.atlassian.jira.plugin.... ]
Radim Vansa closed ISPN-8859.
-----------------------------
Resolution: Out of Date
> DistTopologyChangeUnderLoadSingleOwnerTest takes too long
> ---------------------------------------------------------
>
> Key: ISPN-8859
> URL: https://issues.jboss.org/browse/ISPN-8859
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.CR2
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> The test starts 2 nodes (A and B), kills A and starts C. Test cleanup stops C and gets stuck in DCM.undefineConfiguration as the method is iterating through caches and finds out that {{___counter_configuration}} is not started yet, waiting for initial transfer.
> The node waits for initial ST because by the time C starts the cache is degraded on B and initial transfer cannot continue.
> Proper fix would be letting the cache start but operations throwing AvailabilityException, but workaround would be just keeping wired unstarted caches in another collection directly for the purpose of undefineConfiguration.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9276) FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-9276?page=com.atlassian.jira.plugin.... ]
Radim Vansa closed ISPN-9276.
-----------------------------
> FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
> ----------------------------------------------------------------------------------------------
>
> Key: ISPN-9276
> URL: https://issues.jboss.org/browse/ISPN-9276
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Radim Vansa
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
>
> The test is not currently running during the build because of ISPN-9149, but fails when run manually:
> {noformat}
> java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47)
> at org.infinispan.functional.FunctionalMapTest.doReturnViewFromReadWriteEval(FunctionalMapTest.java:595)
> at org.infinispan.functional.FunctionalMapTest.testDistReturnViewFromReadWriteEvalOnNonOwner(FunctionalMapTest.java:584)
> Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:26)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:17)
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91)
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1364)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1267)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:125)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1412)
> at org.jgroups.JChannel.up(JChannel.java:816)
> Caused by: org.infinispan.commons.marshall.NotSerializableException: org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView
> Caused by: an exception which occurred:
> in object org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView@72d6a840
> -> toString = EntryBackedReadWriteView{entry=VersionedRepeatableReadEntry(39ed5af7){key=TestKey#MagicKey{778/3FB8CBF3/178@FunctionalEncodingTypeTest[tx=true]-NodeB-35039}, value=TestValue#one, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metadata=MetaParamsInternalMetadata{params=MetaParams{length=1, metas=[MetaEntryVersion=SimpleClusteredVersion{topologyId=0, version=0}]}}}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9276) FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-9276?page=com.atlassian.jira.plugin.... ]
Radim Vansa resolved ISPN-9276.
-------------------------------
Resolution: Done
> FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
> ----------------------------------------------------------------------------------------------
>
> Key: ISPN-9276
> URL: https://issues.jboss.org/browse/ISPN-9276
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Radim Vansa
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
>
> The test is not currently running during the build because of ISPN-9149, but fails when run manually:
> {noformat}
> java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47)
> at org.infinispan.functional.FunctionalMapTest.doReturnViewFromReadWriteEval(FunctionalMapTest.java:595)
> at org.infinispan.functional.FunctionalMapTest.testDistReturnViewFromReadWriteEvalOnNonOwner(FunctionalMapTest.java:584)
> Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:26)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:17)
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91)
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1364)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1267)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:125)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1412)
> at org.jgroups.JChannel.up(JChannel.java:816)
> Caused by: org.infinispan.commons.marshall.NotSerializableException: org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView
> Caused by: an exception which occurred:
> in object org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView@72d6a840
> -> toString = EntryBackedReadWriteView{entry=VersionedRepeatableReadEntry(39ed5af7){key=TestKey#MagicKey{778/3FB8CBF3/178@FunctionalEncodingTypeTest[tx=true]-NodeB-35039}, value=TestValue#one, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metadata=MetaParamsInternalMetadata{params=MetaParams{length=1, metas=[MetaEntryVersion=SimpleClusteredVersion{topologyId=0, version=0}]}}}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9384) Integration test suite fails - Not yet supported on UNIX/WINDOWS favour without working ps/lsof/jps/netstat
by Diego Lovison (JIRA)
Diego Lovison created ISPN-9384:
-----------------------------------
Summary: Integration test suite fails - Not yet supported on UNIX/WINDOWS favour without working ps/lsof/jps/netstat
Key: ISPN-9384
URL: https://issues.jboss.org/browse/ISPN-9384
Project: Infinispan
Issue Type: Bug
Affects Versions: 7.2.1.Final
Environment: ibm18,RHEL6
Reporter: Diego Lovison
Priority: Critical
Integration tests: AS Client Module Integration Tests fails due the following
{noformat}
[INFO] Finished at: 2018-07-17T08:19:45-04:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.wildfly.plugins:wildfly-maven-plugin:1.1.0.Alpha5:start (start-server) on project infinispan-jcache-tck-runner: The server failed to start: Managed server was not started within [60] s -> [Help 1]
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (infinispan-server-shutdown) on project infinispan-as-module-client-integrationtests: An Ant BuildException has occured: The following error occurred while executing this line:
[ERROR] /home/jenkins/workspace/jdg-func-ispn-testsuite-rhel-os/d1735843/infinispan/server/integration/src/main/ant/infinispan-server.xml:111: Not yet supported on UNIX/WINDOWS favour without working ps/lsof/jps/netstat
[ERROR] around Ant part ...<ant antfile="../../server/integration/src/main/ant/infinispan-server.xml" target="kill-server">... @ 4:99 in /home/jenkins/workspace/jdg-func-ispn-testsuite-rhel-os/d1735843/infinispan/integrationtests/as-integration-client/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :infinispan-jcache-tck-runner
+ TESTSUITE_RESULT=1
{noformat}
The commands: ps, lsof and netstat works.
The command was not found
{noformat}
-bash-4.1$ cd /qa/tools/opt/x86_64/ibm-java-80
-bash-4.1$ cd bin/
-bash-4.1$ ll
total 640
-rwxrwxr-x. 1 root root 7885 Jun 26 23:36 appletviewer
-rwxrwxr-x. 1 root root 6262 Jun 26 23:36 ControlPanel
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 extcheck
-rwxrwxr-x. 1 root root 7605 Jun 26 23:36 idlj
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 jar
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 jarsigner
-rwxr-xr-x. 1 root root 7374 Jun 26 23:26 java
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 javac
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 javadoc
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 javah
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 javap
-rwxrwxr-x. 1 root root 2280 Jun 26 23:36 java-rmi.cgi
-rwxr-xr-x. 1 root root 7374 Jun 26 23:26 javaw
-rwxrwxr-x. 1 root root 131799 Jun 26 23:36 javaws
-rwxrwxr-x. 1 root root 7645 Jun 26 23:36 jconsole
-rwxrwxr-x. 1 root root 6262 Jun 26 23:36 jcontrol
-rwxrwxr-x. 1 root root 7605 Jun 26 23:36 jdb
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 jdeps
-rwxrwxr-x. 1 root root 7677 Jun 26 23:36 jdmpview
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 jjs
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 jrunscript
-rwxr-xr-x. 1 root root 7613 Jun 26 23:26 keytool
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 native2ascii
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 pack200
-rwxr-xr-x. 1 root root 7917 Jun 26 23:26 policytool
-rwxrwxr-x. 1 root root 7589 Jun 26 23:36 rmic
-rwxr-xr-x. 1 root root 7589 Jun 26 23:26 rmid
-rwxr-xr-x. 1 root root 7613 Jun 26 23:26 rmiregistry
-rwxrwxr-x. 1 root root 7629 Jun 26 23:36 schemagen
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 serialver
-rwxr-xr-x. 1 root root 7613 Jun 26 23:26 tnameserv
-rwxrwxr-x. 1 root root 235164 Jun 26 23:36 unpack200
-rwxrwxr-x. 1 root root 7605 Jun 26 23:36 wsgen
-rwxrwxr-x. 1 root root 7613 Jun 26 23:36 wsimport
-rwxrwxr-x. 1 root root 7605 Jun 26 23:36 xjc
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9276) FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-9276?page=com.atlassian.jira.plugin.... ]
Radim Vansa updated ISPN-9276:
------------------------------
Fix Version/s: 9.3.0.Final
(was: 9.4.0.Final)
> FunctionalEncodingTypeTest.testDistReturnViewFromReadWriteEvalOnNonOwner[tx=true] always fails
> ----------------------------------------------------------------------------------------------
>
> Key: ISPN-9276
> URL: https://issues.jboss.org/browse/ISPN-9276
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Radim Vansa
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
>
> The test is not currently running during the build because of ISPN-9149, but fails when run manually:
> {noformat}
> java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47)
> at org.infinispan.functional.FunctionalMapTest.doReturnViewFromReadWriteEval(FunctionalMapTest.java:595)
> at org.infinispan.functional.FunctionalMapTest.testDistReturnViewFromReadWriteEvalOnNonOwner(FunctionalMapTest.java:584)
> Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from FunctionalEncodingTypeTest[tx=true]-NodeB-35039, see cause for remote stack trace
> at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:26)
> at org.infinispan.remoting.transport.RemoteGetResponseCollector.addResponse(RemoteGetResponseCollector.java:17)
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91)
> at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1364)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1267)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:125)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1412)
> at org.jgroups.JChannel.up(JChannel.java:816)
> Caused by: org.infinispan.commons.marshall.NotSerializableException: org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView
> Caused by: an exception which occurred:
> in object org.infinispan.functional.impl.EntryViews$EntryBackedReadWriteView@72d6a840
> -> toString = EntryBackedReadWriteView{entry=VersionedRepeatableReadEntry(39ed5af7){key=TestKey#MagicKey{778/3FB8CBF3/178@FunctionalEncodingTypeTest[tx=true]-NodeB-35039}, value=TestValue#one, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metadata=MetaParamsInternalMetadata{params=MetaParams{length=1, metas=[MetaEntryVersion=SimpleClusteredVersion{topologyId=0, version=0}]}}}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9383) TestNG Reporter will generate a new testcase in Polarion when toString point to hashCode
by Diego Lovison (JIRA)
Diego Lovison created ISPN-9383:
-----------------------------------
Summary: TestNG Reporter will generate a new testcase in Polarion when toString point to hashCode
Key: ISPN-9383
URL: https://issues.jboss.org/browse/ISPN-9383
Project: Infinispan
Issue Type: Bug
Reporter: Diego Lovison
We are using the test method Name(arguments) to store the informations in Polarion
Here are some examples:
{noformat}
org.infinispan.persistence.APINonTxPersistenceTest,testLockedStreamActuallyLocks([org.infinispan.api.APINonTxTest$$Lambda$2157/681959835@68a0620f, false])
org.infinispan.persistence.APINonTxPersistenceTest,testLockedStreamActuallyLocks([org.infinispan.api.APINonTxTest$$Lambda$2152/1823898408@b34c679, false])
org.infinispan.persistence.APINonTxPersistenceTest,testLockedStreamActuallyLocks([org.infinispan.api.APINonTxTest$$Lambda$2158/2043784776@18572695, false])
org.infinispan.persistence.APINonTxPersistenceTest,testLockedStreamActuallyLocks([org.infinispan.api.APINonTxTest$$Lambda$2157/681959835@68a0620f, true])
org.infinispan.persistence.APINonTxPersistenceTest,testLockedStreamActuallyLocks([org.infinispan.api.APINonTxTest$$Lambda$2154/349749176@1a859768, true])
{noformat}
In this case, every time that we will run the test, the toString of the lambda expression will change and Polarion will generate a new test case.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by M S (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
M S commented on ISPN-9014:
---------------------------
Sorry, I misplaced version number in my original post, but I think you got it right (I edited it already).
Anyway, thanks for a hint, we'll give it a try with 9.3.1.Final
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by M S (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
M S edited comment on ISPN-9014 at 7/17/18 6:37 AM:
----------------------------------------------------
Hi.
In one of our environments based on infinispan version 9.3.0 where we have 15 nodes in cloud we got to the point where similar issue occurs on node 22 (Cache zones encountered exception whilst trying to resolve conflicts on merge: java.util.concurrent.CompletionException: org.infinispan.commons.CacheException).
We reproduced it while having 15 nodes in cloud, and then unplugging and plugging node11 back.
I'm attaching infinispan logs from the failed controllers and our cluster config.
Please have a look if the issue is really the same one and the fix from beta version is not sufficient, or new issue must be created.
Thx
was (Author: staho):
Hi.
In one of our environments based on infinispan version 9.1.3 where we have 15 nodes in cloud we got to the point where similar issue occurs on node 22 (Cache zones encountered exception whilst trying to resolve conflicts on merge: java.util.concurrent.CompletionException: org.infinispan.commons.CacheException).
We reproduced it while having 15 nodes in cloud, and then unplugging and plugging node11 back.
I'm attaching infinispan logs from the failed controllers and our cluster config.
Please have a look if the issue is really the same one and the fix from beta version is not sufficient, or new issue must be created.
Thx
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
Ryan Emerson commented on ISPN-9014:
------------------------------------
[~staho] I recommend upgrading to 9.3.1.Final, there were a lot of changes/fixes related to partition handling that went into Infinispan 9.3.
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 2 months