[JBoss JIRA] (ISPN-9291) BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9291?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9291:
----------------------------------
Fix Version/s: 9.4.2.Final
(was: 9.4.1.Final)
> BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-9291
> URL: https://issues.jboss.org/browse/ISPN-9291
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: testsuite_stability
> Fix For: 9.4.2.Final
>
>
> The partition handling tests use {{BasePartitionHandlingTest.Partition.installMergeView(view1, view2)}} to install the merge view without waiting for {{MERGE3}} to run, making them much faster. Unfortunately, the implementation is incorrect: {{GMS.installView(view)}} only works for regular views, merge views need to be installed with {{GMS.installView(mergeView, digest)}}.
> The result is that the nodes that got isolated from the coordinator request the retransmission of all the {{NAKACK2}} messages (including view updates) since the cluster first started. The isolated nodes cannot install the merge view until they deliver all the older messages (even without knowing whether they're OOB or not). But if {{STABLE}} ran and cleared a range of messages already, the retransmission request cannot be satisfied, so the view updates will never be delivered.
> This is easily reproducible in {{CrashedNodeDuringConflictResolutionTest}} if we add a delay before updating the topology in {{StateConsumerImpl}}. The test installs the merge view manually, but then kills NodeC and expects the cluster to install the new view automatically. NodeD can't install the new view because it's waiting for earlier messages from NodeA:
> {noformat}
> 18:27:13,054 INFO (testng-test:[]) [TestSuiteProgress] Test starting: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> 18:27:13,640 DEBUG (testng-test:[]) [GMS] test-NodeA-39513: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,674 DEBUG (testng-test:[]) [GMS] test-NodeD-59078: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,828 TRACE (jgroups-7,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((1): {50}) to test-NodeA-39513
> 18:27:13,966 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((49): {1-49}) to test-NodeA-39513
> 18:27:14,067 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:14,504 DEBUG (testng-test:[]) [DefaultCacheManager] Stopping cache manager ISPN on test-NodeC-43706
> 18:27:18,642 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: joiners=[], suspected=[test-NodeC-43706], leaving=[], new view: [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,643 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: mcasting view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,646 DEBUG (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: installing view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,652 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: sending msg to null, src=test-NodeA-39513, headers are GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=63], TP: [cluster_name=ISPN]
> 18:27:18,656 TRACE (jgroups-20,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: received [dst: test-NodeA-39513, src: test-NodeB-9439 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=100, TP: [cluster_name=ISPN]
> 18:27:20,554 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,653 WARN (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: failed to collect all ACKs (expected=2) for view [test-NodeA-39513|11] after 2000ms, missing 1 ACKs from (1) test-NodeD-59078
> 18:27:20,656 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,756 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> ...
> 18:28:14,412 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,513 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,589 ERROR (testng-test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> java.lang.RuntimeException: Cache ___defaultcache timed out waiting for rebalancing to complete on node test-NodeA-39513, current topology is CacheTopology{id=21, phase=CONFLICT_RESOLUTION, rebalanceId=7, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[test-NodeD-59078: 256+0, test-NodeA-39513: 0+256, test-NodeB-9439: 0+256]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeD-59078, test-NodeA-39513, test-NodeB-9439], persistentUUIDs=[828108c4-4251-49fc-9481-ff6392bea9fb, 1d4b6f07-b71b-41a1-adfb-abbe68944a9f, 3a1ece05-c282-433e-9eb5-7b3e0f1932aa]}. rebalanceInProgress=true, currentChIsBalanced=true
> at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:392) ~[test-classes/:?]
> at org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.performMerge(CrashedNodeDuringConflictResolutionTest.java:113) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:137) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-9270) ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9270?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9270:
----------------------------------
Fix Version/s: 9.4.2.Final
(was: 9.4.1.Final)
> ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
> ---------------------------------------------------------------------
>
> Key: ISPN-9270
> URL: https://issues.jboss.org/browse/ISPN-9270
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 9.4.2.Final
>
>
> The test stops the coordinator A of a 2-node cluster \[A, B\], and then verifies that B stays available.
> Unfortunately that doesn't always happen: A tries to install topology \[B\] before shutting down, but B might receive it too late and reject it. If B expects members \[A, B\] and sees only B, the cache will enter degraded mode:
> {noformat}
> 19:04:05,863 INFO (jgroups-8,Test-NodeB-21374:[org.infinispan.LOCKS]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100008: Updating cache members list [Test-NodeD-51051], topology id 7
> 19:04:05,863 TRACE (jgroups-4,Test-NodeD-51051:[]) [JGroupsTransport] Test-NodeD-51051 received command from Test-NodeB-21374: CacheTopologyControlCommand{cache=org.infinispan.LOCKS, type=CH_UPDATE, sender=Test-NodeB-21374, joinInfo=null, topologyId=7, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[Test-NodeD-51051: 256]}, pendingCH=null, availabilityMode=AVAILABLE, phase=NO_REBALANCE, actualMembers=[Test-NodeD-51051], throwable=null, viewId=1}
> 19:04:05,880 INFO (jgroups-4,Test-NodeD-51051:[]) [CLUSTER] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeD-51051|2] (1) [Test-NodeD-51051]
> 19:04:05,884 DEBUG (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [ClusterCacheStatus] Recovered 1 partition(s) for cache org.infinispan.LOCKS: [CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeB-21374: 124, Test-NodeD-51051: 132]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-51051], persistentUUIDs=[6ac3a341-5af4-4aa2-ae7b-a4183d02d959]}]
> 19:04:05,884 WARN (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [Test-NodeD-51051], lost members are [Test-NodeB-21374], stable members are [Test-NodeB-21374, Test-NodeD-51051]
> 19:04:05,885 INFO (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100011: Entering availability mode DEGRADED_MODE, topology id 8
> 19:04:05,941 DEBUG (transport-thread-Test-NodeD-p100-t2:[Topology-org.infinispan.LOCKS]) [LocalTopologyManagerImpl] Ignoring topology 7 for cache org.infinispan.LOCKS from old coordinator Test-NodeB-21374
> 19:04:06,024 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking
> java.lang.Error: java.util.concurrent.ExecutionException: java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.lock.exception.ClusteredLockException: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=Test}' is not available. Not all owners are in this partition
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47) ~[infinispan-core-9.3.0-SNAPSHOT-tests.jar:9.3.0-SNAPSHOT]
> at org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking(ClusteredLockWith2NodesTest.java:49) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-9257) ClustertopologyManagerTest.testAbruptLeaveAfterGetStatus2[SCATTERED_SYNC, tx=false] random failures
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9257?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9257:
----------------------------------
Fix Version/s: 9.4.2.Final
(was: 9.4.1.Final)
> ClustertopologyManagerTest.testAbruptLeaveAfterGetStatus2[SCATTERED_SYNC, tx=false] random failures
> ---------------------------------------------------------------------------------------------------
>
> Key: ISPN-9257
> URL: https://issues.jboss.org/browse/ISPN-9257
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: testsuite_stability
> Fix For: 9.4.2.Final
>
> Attachments: ISPN-8731_wrong_topology_2018-05-18_ClusterTopologyManagerTest-infinispan-core.log.gz
>
>
> The test kills the coordinator NodeA, then while NodeB is trying to recover the caches it also kills NodeC. It expects NodeB to start a rebalance with 2 nodes and discards it, in order to test that it can process the 1-node rebalance first:
> {noformat}
> 00:34:06,582 DEBUG (transport-thread-test-NodeB-p12-t6:[testCache]) [ClusterTopologyManagerTest] Discarding rebalance command CacheTopology{id=8, phase=TRANSITORY, rebalanceId=5, currentCH=ScatteredConsistentHash{ns=256, rebalanced=false, owners = (2)[test-NodeB-49590: 85, test-NodeC-58596: 85]}, pendingCH=ScatteredConsistentHash{ns=256, rebalanced=true, owners = (2)[test-NodeB-49590: 128, test-NodeC-58596: 128]}, unionCH=null, actualMembers=[test-NodeB-49590, test-NodeC-58596], persistentUUIDs=[6b96414e-15d8-4350-aa3c-4fb4fc34e888, d47dc4a9-2a95-4bb1-a83b-bb8a27c9999f]}
> 00:34:06,609 DEBUG (transport-thread-test-NodeB-p12-t2:[Topology-testCache]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=9, phase=TRANSITORY, rebalanceId=5, currentCH=ScatteredConsistentHash{ns=256, rebalanced=false, owners = (1)[test-NodeB-49590: 85]}, pendingCH=ScatteredConsistentHash{ns=256, rebalanced=false, owners = (1)[test-NodeB-49590: 128]}, unionCH=null, actualMembers=[test-NodeB-49590], persistentUUIDs=[6b96414e-15d8-4350-aa3c-4fb4fc34e888]}
> 00:34:06,609 DEBUG (transport-thread-test-NodeB-p12-t2:[Topology-testCache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=8, phase=NO_REBALANCE, rebalanceId=4, currentCH=ScatteredConsistentHash{ns=256, rebalanced=false, owners = (1)[test-NodeB-49590: 85]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeB-49590], persistentUUIDs=[6b96414e-15d8-4350-aa3c-4fb4fc34e888]} for cache testCache
> {noformat}
> Unfortunately {{PreferAvailabilityStrategy}} has changed a bit and the rebalance ids don't always match the expectations of the test, so that the 1-node rebalance is discarded instead:
> {noformat}
> 09:46:10,530 DEBUG (transport-thread-Test-NodeB-p54539-t3:[testCache]) [Test] Discarding rebalance command CacheTopology{id=9, phase=TRANSITORY, rebalanceId=5, currentCH=ScatteredConsistentHash{ns=256, rebalanced=false, owners = (1)[Test-NodeB-62039: 85]}, pendingCH=ScatteredConsistentHash{ns=256, rebalanced=true, owners = (1)[Test-NodeB-62039: 256]}, unionCH=null, actualMembers=[Test-NodeB-62039], persistentUUIDs=[0ed7be74-4485-489b-baee-28c461c9e5de]}
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-9173) Availability mode should be updated atomically with the actual members
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9173?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9173:
----------------------------------
Fix Version/s: 9.4.2.Final
(was: 9.4.1.Final)
> Availability mode should be updated atomically with the actual members
> ----------------------------------------------------------------------
>
> Key: ISPN-9173
> URL: https://issues.jboss.org/browse/ISPN-9173
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 9.4.2.Final
>
>
> This is a follow-up on ISPN-7682, which asks for the topology itself to be updated atomically.
> {{LocalTopologyManagerImpl}} has additional logic to update the availability mode first when the cache becomes degraded and to update it last when the cache becomes available, which means any delay between the updates cannot cause data inconsistencies.
> But that logic doesn't really belong in {{LocalTopologyManagerImpl}}, and it's easy to forget it's there (and in fact we had a bug there related to the new rebalance phases).
> In addition, tests that want to check the cache behaviour in degraded mode and wait only for the availability mode change will fail if there's a big delay between the availability mode change. I actually hit this while testing my ISPN-8731/ISPN-7682 changes, and I had added a random delay in {{StateConsumerImpl}} before {{distributionManager.setCacheTopology()}}.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month
[JBoss JIRA] (ISPN-9177) Add RBAC to admin console pages
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-9177?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9177:
----------------------------------
Fix Version/s: 9.4.2.Final
(was: 9.4.1.Final)
> Add RBAC to admin console pages
> -------------------------------
>
> Key: ISPN-9177
> URL: https://issues.jboss.org/browse/ISPN-9177
> Project: Infinispan
> Issue Type: Feature Request
> Components: Console
> Affects Versions: 9.2.3.Final, 9.3.0.Beta1
> Reporter: Vladimir Blagojevic
> Assignee: Vladimir Blagojevic
> Priority: Major
> Fix For: 9.4.2.Final
>
>
> Up until now, we have assumed that admin user is able to invoke all operations, access and mutate all data from the admin console. With the advent of data manipulation capabilities in the console (ISPN-8759) and taking into account all existing management operations (some are destructive and irreversible), we should introduce RBAC to admin console page.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 1 month