]
Dan Berindei commented on ISPN-10366:
-------------------------------------
I think the problem is that {{StateConsumerImpl.onTopologyUpdate}} doesn't know about
{{chunkCounter}} and calls {{notifyEndOfStateTransferIfNeeded}}, which calls
{{stopApplyingState()}} and changes the state of all the segments to {{OWNED}}. It helped
that the {{StateResponseCommand}} with the keys had already arrived, and was just waiting
for {{stateTransferLock.transactionDataReceived()}} call to run and remove the inbound
transfer.
ScatteredStateConsumerImpl sets segment state to OWNED before
applying values
-----------------------------------------------------------------------------
Key: ISPN-10366
URL:
https://issues.jboss.org/browse/ISPN-10366
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 10.0.0.Beta3, 9.4.15.Final
Reporter: Dan Berindei
Assignee: Radim Vansa
Priority: Major
Labels: testsuite_stability
Fix For: 10.0.0.Beta4
Attachments:
ISPN-10363_LazyInitializingExecutorService_94x_20190627-2010_PrefetchTest-infinispan-core.log.gz
{{ScatteredStateConsumerImpl}} uses {{InboundTransferTask}}s only to request keys, then
after it received all the keys of a segment it changes the segment state to
{{VALUE_TRANSFER}} and starts an asynchronous request to fetch the values and replace the
{{RemoteMetadata}} entries with real entries.
{{ScatteredStateConsumerImpl.chunkCounter}} is supposed to delay the state transfer end
and the segment state change to {{OWNED}}, but in rare occasions this doesn't happen.
This happened in {{PrefetchTest.testPrefetch12}} while running the test suite with
{{taskset -c 1-2}}:
{noformat}
21:54:43,304 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[StateConsumerImpl] Received new topology for cache ___defaultcache, isRebalance = true,
isMember = true, topology = CacheTopology{id=9, phase=TRANSITORY, rebalanceId=5,
currentCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=false, owners
= (2)[Test-NodeA-39104: 1, Test-NodeC-3746: 0]},
pendingCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=true, owners
= (2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]},
unionCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=false, owners =
(2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]}, actualMembers=[Test-NodeA-39104,
Test-NodeC-3746], persistentUUIDs=[f58e0a9a-dd4e-429a-8464-da64bf001d4e,
1471096f-c59a-4dc9-8f4d-31fbf399a2aa]}
21:54:43,305 TRACE
(stateTransferExecutor-thread-Test-NodeC-p69908-t6:[StateRequest-___defaultcache])
[ScatteredStateConsumerImpl] Requesting keys for segments {0} from Test-NodeA-39104
21:54:43,313 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[StateConsumerImpl] Topology update processed, stateTransferTopologyId = 9, startRebalance
= true, pending CH = PartitionerConsistentHash:ScatteredConsistentHash{ns=1,
rebalanced=true, owners = (2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]}
21:54:43,313 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[StateTransferLockImpl] Signalling transaction data received for topology 9
21:54:43,313 TRACE (remote-thread-Test-NodeC-p69905-t2:[])
[TrianglePerCacheInboundInvocationHandler] Calling perform() on
StateResponseCommand{cache=___defaultcache, pushTransfer=false,
stateChunks=[StateChunk{segmentId=0, cacheEntries=1, isLastChunk=true}],
origin=Test-NodeA-39104, topologyId=9, applyState=true}
21:54:43,313 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[StateConsumerImpl] Applying new state chunk for segment 0 of cache ___defaultcache from
node Test-NodeA-39104: received 1 cache entries
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredVersionManagerImpl] Finished transfer for segment 0 = KEY_TRANSFER ->
VALUE_TRANSFER
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredVersionManagerImpl] Node Test-NodeC-3746, segment 0 has all keys in, expects
value transfer
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredStateConsumerImpl] Requesting values from segments {0}, for in-memory keys
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredStateConsumerImpl] Retrieving values, chunk counter is 1
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[JGroupsTransport] Test-NodeC-3746 sending request 11 to Test-NodeA-39104:
ClusteredGetAllCommand{keys=[key], flags=[SKIP_OWNERSHIP_CHECK], topologyId=9}
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredStateConsumerImpl] Invalidating versions on Test-NodeC-3746, chunk counter
incremented to 2
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredStateConsumerImpl] Versions invalidated on Test-NodeC-3746, chunk counter
decremented to 1
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[StateConsumerImpl] Removing inbound transfers from node {0} for segments
Test-NodeA-39104
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[ScatteredStateConsumerImpl] Inbound transfer removed, chunk counter is 1
21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[])
[StateConsumerImpl] Latch 0
21:54:43,315 TRACE (jgroups-7,Test-NodeC-3746:[]) [JGroupsTransport] Test-NodeC-3746
received response for request 11 from Test-NodeA-39104:
SuccessfulResponse([MetadataImmortalCacheValue {value=v0,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=SimpleClusteredVersion{topologyId=7, version=1}}}])
21:54:43,316 TRACE (jgroups-7,Test-NodeC-3746:[]) [BlockingInterceptor] Command blocking
before completion of PutKeyValueCommand{key=key, value=v0, flags=[CACHE_MODE_LOCAL,
SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, SKIP_SHARED_CACHE_STORE, SKIP_OWNERSHIP_CHECK,
IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP],
commandInvocationId=CommandInvocation:Test-NodeC-3746:121294, putIfAbsent=false,
valueMatcher=MATCH_ALWAYS,
metadata=InternalMetadataImpl{actual=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=SimpleClusteredVersion{topologyId=7, version=1}}, created=-1, lastUsed=-1},
successful=true, topologyId=-1}
21:54:43,316 TRACE (remote-thread-Test-NodeC-p69905-t2:[___defaultcache])
[StateConsumerImpl] After applying the received state the data container of cache
___defaultcache has 1 keys
21:54:43,316 TRACE (remote-thread-Test-NodeC-p69905-t2:[___defaultcache])
[StateConsumerImpl] Segments not received yet for cache ___defaultcache: {}
21:54:43,316 DEBUG (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[StateConsumerImpl] Finished receiving of segments for cache ___defaultcache for topology
9.
21:54:43,316 DEBUG (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[ScatteredVersionManagerImpl] Node Test-NodeC-3746 received values for all segments in
topology 9
21:54:43,316 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache])
[StateConsumerImpl] Stop keeping track of changed keys for state transfer in topology 9
{noformat}
The test then starts a put operation and expects it to prefetch the previous value, but
because the segment is {{OWNED}}, the {{RemoteMetadata}} is ignored:
{noformat}
21:54:43,316 TRACE (ForkThread-1,Test:[]) [InvocationContextInterceptor] Invoked with
command PutKeyValueCommand{key=key, value=v1, flags=[],
commandInvocationId=CommandInvocation:Test-NodeC-3746:121295, putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=null}, successful=true, topologyId=-1} and InvocationContext
[SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null,
lockOwner=CommandInvocation:Test-NodeC-3746:121295}]
21:54:43,316 TRACE (ForkThread-1,Test:[]) [EntryFactoryImpl] Retrieved from container
MetadataImmortalCacheEntry{key=key, value=null,
metadata=RemoteMetadata{address=Test-NodeA-39104, version=1}}
21:54:43,316 TRACE (ForkThread-1,Test:[]) [ScatteredDistributionInterceptor] Committing
entry RepeatableReadEntry(108d175b){key=key, value=v1, isCreated=false, isChanged=true,
isRemoved=false, isExpired=false, skipLookup=true,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=SimpleClusteredVersion{topologyId=9, version=1}}}, replaced
MetadataImmortalCacheEntry{key=key, value=null,
metadata=RemoteMetadata{address=Test-NodeA-39104, version=1}}
21:54:53,316 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed:
org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch12
org.infinispan.test.TestException: java.util.concurrent.TimeoutException
at org.infinispan.util.ControlledRpcManager.uncheckedGet(ControlledRpcManager.java:259)
~[test-classes/:?]
at org.infinispan.util.ControlledRpcManager.expectCommand(ControlledRpcManager.java:124)
~[test-classes/:?]
at
org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch(PrefetchTest.java:110)
~[test-classes/:?]
at
org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch12(PrefetchTest.java:67)
~[test-classes/:?]
{noformat}
On a related note, {{StateConsumerImpl.applyState(pushTransfer=true)}} initializes a
{{CountDownLatch(stateChunks.size())}}, but doesn't actually count down if
{{stateChunk.getCacheEntries() == null}}, potentially hanging state transfer until it
times out.