[infinispan-issues] [JBoss JIRA] (ISPN-9105) Majority partition nodes can process minority topology updates after merge

Dan Berindei (JIRA) issues at jboss.org
Thu Apr 26 03:30:00 EDT 2018


Dan Berindei created ISPN-9105:
----------------------------------

             Summary: Majority partition nodes can process minority topology updates after merge
                 Key: ISPN-9105
                 URL: https://issues.jboss.org/browse/ISPN-9105
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 9.2.1.Final, 9.3.0.Alpha1
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 9.3.0.Beta1


After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition. 

We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.

The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.

This causes random failures in the conflict resolution tests, e.g. {{MergePolicyPreferredAlwaysTest}}:

{noformat}
19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext at 48a2a9d5]
19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=DURING SPLIT} from container

19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}

19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51 at Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
	at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
	at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
	at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
	at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
	at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the infinispan-issues mailing list