Dan Berindei created ISPN-9104:
----------------------------------
Summary: Majority partition nodes can process minority topology updates after
merge
Key: ISPN-9104
URL:
https://issues.jboss.org/browse/ISPN-9104
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.3.0.Alpha1, 9.2.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.3.0.Beta1
After a merge, NAKACK2 resends some broadcast messages that were originally sent in a
partition to the members of the merged cluster that weren't in that partition.
We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong
coordinator, but unfortunately that only happens after calling
resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current
topology id, that can install a "reset" topology to prepare for rebalance.
The reset topology has all the owners owned by the minority partition nodes, so the
majority partition nodes installing this topology will invalidate all their entries before
conflict resolution even starts.
This causes random failures in the conflict resolution tests, e.g.
{{MergePolicyPreferredAlwaysTest}}:
{noformat}
19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view
MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368,
Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3)
[Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2)
[Test-NodeD-49504, Test-NodeE-55304]
19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler]
Attempting to execute non-CacheRpcCommand:
CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE,
sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7,
currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132,
Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null,
phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304],
throwable=null, viewId=7} [sender=Test-NodeD-49504]
19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20,
phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null,
actualMembers=[Test-NodeD-49504, Test-NodeE-55304],
persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2,
59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments:
RangeSet(256)
19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed
segments: {0-255}
19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[InvocationContextInterceptor] Invoked with command
InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and
InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[])
[JGroupsTransport] Test-NodeA-50368 sending request 232 to all:
SingleRpcCommand{cacheName='___defaultcache',
command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null,
metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES],
commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS,
topologyId=24}}
19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache])
[DefaultDataContainer] Removed
ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT}
from container
19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[])
[DefaultConflictManager] Cache ___defaultcache conflict detected
{Test-NodeA-50368=NullCacheEntry{},
Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368},
value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{},
Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368},
value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[])
[DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy
org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries
[ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT},
NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368},
value=BEFORE SPLIT}, NullCacheEntry{}]
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[])
[DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key
MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed:
org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC,
5N]
java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap:
{Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null,
Test-NodeD-49504=null, Test-NodeB-27290=null}
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
at
org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113)
~[test-classes/:?]
at
org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138)
~[test-classes/:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)