[JBoss JIRA] (ISPN-9032) OperationsDuringMergeConflictTest.testPartitionMergePolicy random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9032?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9032:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> OperationsDuringMergeConflictTest.testPartitionMergePolicy random failures
> --------------------------------------------------------------------------
>
> Key: ISPN-9032
> URL: https://issues.jboss.org/browse/ISPN-9032
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
>
> The test modifies a key in both partitions and checks that each partition still sees its own value for a while after receiving the [merged cluster view|https://github.com/infinispan/infinispan/blob/ee87349f0b134b53b3090f...].
> It installs a {{BlockStateResponseCommandHandler}} to block the conflict resolution responses and to keep the pre-merge values in the non-preferred topology owners. However, it does not block the installation of the {{CONFLICT_RESOLUTION}} cache topology. And during conflict resolution the read CH is the preferred topology's read CH, so non-preferred topology owners will ask the primary owner for the value:
> {noformat}
> 17:12:07,088 INFO (testng-Test:[]) [JGroupsTransport] ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[Test-NodeI-48950|10] (4) [Test-NodeI-48950, Test-NodeJ-6577, Test-NodeK-2872, Test-NodeL-23686], 2 subgroups: [Test-NodeI-48950|8] (2) [Test-NodeI-48950, Test-NodeJ-6577], [Test-NodeK-2872|9] (2) [Test-NodeK-2872, Test-NodeL-23686]
> 17:12:07,134 TRACE (stateTransferExecutor-thread-Test-NodeI-p50536-t2:[Merge-10]) [DefaultConflictManager] Cache ___defaultcache attempting to receive all replicas for segment 0 with topology CacheTopology{id=20, rebalanceId=7, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134]}, pendingCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, unionCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, phase=CONFLICT_RESOLUTION, actualMembers=[Test-NodeK-2872, Test-NodeL-23686, Test-NodeI-48950, Test-NodeJ-6577], persistentUUIDs=[40b58191-37f1-4fa4-8e4b-f1a7c17bfa59, 208d742d-ecc9-4792-9dd5-f74d5666f5a2, d8914096-a5a4-4e1e-95a0-27c50fee9999, 7464e58b-5e23-4c1b-909e-4e53055f12ca]}
> 17:12:07,212 TRACE (testng-Test:[]) [BaseDistributionInterceptor] Perform remote get for key MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}. currentTopologyId=20, owners=[Test-NodeK-2872, Test-NodeL-23686]
> 17:12:07,212 TRACE (testng-Test:[]) [RpcManagerImpl] Test-NodeI-48950 invoking ClusteredGetCommand{key=MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}, flags=[]} to recipient list [Test-NodeK-2872, Test-NodeL-23686] with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=org.infinispan.interceptors.impl.BaseRpcInterceptor$1@241e783a, responseMode=WAIT_FOR_VALID_RESPONSE}
> 17:12:07,225 TRACE (jgroups-4,Test-NodeI-48950:[]) [CommandAwareRpcDispatcher] Got acceptable response: Responses{
> Test-NodeK-2872: value=SuccessfulResponse{responseValue=ImmortalCacheValue {value=B}} , received=true, suspected=false
> 17:12:07,226 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.OperationsDuringMergeConflictTest.testPartitionMergePolicy
> java.lang.AssertionError: Key=MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}, Value=A, Cache Index=0, Topology=CacheTopology{id=20, rebalanceId=7, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, phase=CONFLICT_RESOLUTION, actualMembers=[Test-NodeK-2872, Test-NodeL-23686, Test-NodeI-48950, Test-NodeJ-6577], persistentUUIDs=[40b58191-37f1-4fa4-8e4b-f1a7c17bfa59, 208d742d-ecc9-4792-9dd5-f74d5666f5a2, d8914096-a5a4-4e1e-95a0-27c50fee9999, 7464e58b-5e23-4c1b-909e-4e53055f12ca]} expected:<A> but was:<B>
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.8.8.jar:?]
> at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) ~[testng-6.8.8.jar:?]
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) ~[testng-6.8.8.jar:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.assertCacheGet(BaseMergePolicyTest.java:160) ~[test-classes/:?]
> at org.infinispan.conflict.impl.OperationsDuringMergeConflictTest.performMerge(OperationsDuringMergeConflictTest.java:121) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:116) ~[test-classes/:?]
> {noformat}
> https://ci.infinispan.org/job/Infinispan/job/master/543/testReport/org.in...
> Maybe we should use the union CH for reading during conflict resolution instead?
> Speaking of which, {{PreferAvailabilityStrategy}} sets the {{unionCH}} field in the conflict resolution {{CacheTopology}}, but it is ignored because {{CacheTopologyControlCommand}} doesn't have a field for it. We should set {{unionCH=null}} in {{PreferAvailabilityStrategy}} to make things clearer. We should also try to reduce the number of times we log the cache topology during conflict resolution (even if it's only at trace level).
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9014:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9089) Conflict Resolution phase should be non-blocking and restart on node failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9089?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9089:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Conflict Resolution phase should be non-blocking and restart on node failures
> -----------------------------------------------------------------------------
>
> Key: ISPN-9089
> URL: https://issues.jboss.org/browse/ISPN-9089
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.2.1.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Fix For: 9.3.0.Beta1
>
>
> Currently if a node crashes during the conflict resolution phase, all topology updates and rebalancing is blocked until CR times out. As the CR timeout is the state transfer timeout, this means the cluster is in limbo for 4 minutes by default. This blocking behaviour occurs because currently the entire CR phase is executed in the merge thread, however it should occur asynchronously in a similar manner to a cache rebalance with CR restarting in the event of nodes leaving.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9104) Majority partition nodes can process minority topology updates after merge
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9104?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9104:
-------------------------------
Status: Open (was: New)
> Majority partition nodes can process minority topology updates after merge
> --------------------------------------------------------------------------
>
> Key: ISPN-9104
> URL: https://issues.jboss.org/browse/ISPN-9104
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.1.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
>
> After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition.
> We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.
> The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.
> This causes random failures in the conflict resolution tests, e.g. {{MergePolicyPreferredAlwaysTest}}:
> {noformat}
> 19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
> 19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
> 19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
> 19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
> 19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
> 19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
> 19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
> 19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
> 19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT} from container
> 19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
> 19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
> 19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
> 19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
> java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
> at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9104) Majority partition nodes can process minority topology updates after merge
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9104:
----------------------------------
Summary: Majority partition nodes can process minority topology updates after merge
Key: ISPN-9104
URL: https://issues.jboss.org/browse/ISPN-9104
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.3.0.Alpha1, 9.2.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.3.0.Beta1
After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition.
We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.
The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.
This causes random failures in the conflict resolution tests, e.g. {{MergePolicyPreferredAlwaysTest}}:
{noformat}
19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT} from container
19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9103) Default merge-policy should be NONE instead of PREFERRED_ALWAYS
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-9103?page=com.atlassian.jira.plugin.... ]
Ryan Emerson updated ISPN-9103:
-------------------------------
Sprint: Sprint 9.3.0.Beta1
> Default merge-policy should be NONE instead of PREFERRED_ALWAYS
> ---------------------------------------------------------------
>
> Key: ISPN-9103
> URL: https://issues.jboss.org/browse/ISPN-9103
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.3.0.Alpha1
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Fix For: 9.3.0.Beta1
>
>
> Due to the changes introduced in ISPN-8962 there is no longer a real difference between the PREFERRED_ALWAYS policy and utilising no conflict resolution. In fact, the latter provides the same behaviour but without the added performance cost of CR on partition merges, therefore we should set the default merge-policy to NONE.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9103) Default merge-policy should be NONE instead of PREFERRED_ALWAYS
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-9103?page=com.atlassian.jira.plugin.... ]
Ryan Emerson updated ISPN-9103:
-------------------------------
Status: Open (was: New)
> Default merge-policy should be NONE instead of PREFERRED_ALWAYS
> ---------------------------------------------------------------
>
> Key: ISPN-9103
> URL: https://issues.jboss.org/browse/ISPN-9103
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.3.0.Alpha1
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Fix For: 9.3.0.Beta1
>
>
> Due to the changes introduced in ISPN-8962 there is no longer a real difference between the PREFERRED_ALWAYS policy and utilising no conflict resolution. In fact, the latter provides the same behaviour but without the added performance cost of CR on partition merges, therefore we should set the default merge-policy to NONE.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9103) Default merge-policy should be NONE instead of PREFERRED_ALWAYS
by Ryan Emerson (JIRA)
Ryan Emerson created ISPN-9103:
----------------------------------
Summary: Default merge-policy should be NONE instead of PREFERRED_ALWAYS
Key: ISPN-9103
URL: https://issues.jboss.org/browse/ISPN-9103
Project: Infinispan
Issue Type: Enhancement
Components: Core
Affects Versions: 9.3.0.Alpha1
Reporter: Ryan Emerson
Assignee: Ryan Emerson
Fix For: 9.3.0.Beta1
Due to the changes introduced in ISPN-8962 there is no longer a real difference between the PREFERRED_ALWAYS policy and utilising no conflict resolution. In fact, the latter provides the same behaviour but without the added performance cost of CR on partition merges, therefore we should set the default merge-policy to NONE.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months