]
Tristan Tarrant updated ISPN-9014:
----------------------------------
Sprint: Sprint 9.3.0.Alpha1, Sprint 9.3.0.Beta1 (was: Sprint 9.3.0.Alpha1)
Conflict resolution consistent hash should not include nodes that are
not in the merged cluster view
----------------------------------------------------------------------------------------------------
Key: ISPN-9014
URL:
https://issues.jboss.org/browse/ISPN-9014
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 9.2.1.Final
Reporter: Dan Berindei
Assignee: Ryan Emerson
Labels: testsuite_stability
Fix For: 9.3.0.Alpha1
Conflict resolution fails when trying to read entries from nodes that are not in the
JGroups cluster view, and this causes random failures in
{{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
# NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC]
(topology id 11)
# One node doesn't receive topology 11, so NodeB becomes coordinator and starts
conflict resolution with all 3 nodes in the pending CH (topology 12)
# Conflict resolution fails because NodeB and NodeC can't read the entries from
NodeA
# {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without
canceling the previous rebalance first (topology 13)
# Because there is no reset topology, NodeB thinks it already requested all the segments
for NodeC's in topology 11, so it doesn't add any new inbound transfer
# NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and
state transfer hangs.
{noformat}
14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting
cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11,
phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners =
(2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]},
pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125,
Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145,
Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7,
ae95a681-2ba1-4e04-bfe5-05aa59425149]}
14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3])
[ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener:
[CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4,
currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50,
Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners =
(2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null,
actualMembers=[Test-NodeB-45145, Test-NodeC-20831],
persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7,
ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE,
rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087:
78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null,
actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831],
persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25,
301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3])
[ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache
cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION,
rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145:
128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners =
(3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]},
unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50,
Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145,
Test-NodeA-57087, Test-NodeC-20831],
persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7,
48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability
mode = null
14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3])
[DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to
resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException:
ISPN000400: Node Test-NodeA-57087 was suspected
14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3])
[CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology
CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6,
currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50,
Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners =
(2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null,
actualMembers=[Test-NodeB-45145, Test-NodeC-20831],
persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7,
ae95a681-2ba1-4e04-bfe5-05aa59425149]}
14:52:52,577 TRACE
(stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener])
[StateConsumerImpl] Waiting for inbound transfer to finish:
InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113
116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254},
finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101
104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245
249-254}, source=Test-NodeC-20831, isCancelled=false,
completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed],
topologyId=11, timeout=240000, cacheName=cluster-listener}
14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener])
[StateConsumerImpl] Discarding state response with old topology id 11 for cache
cluster-listener, state transfer request topology was true
14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport]
Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
{noformat}