]
Tristan Tarrant updated ISPN-6341:
----------------------------------
Fix Version/s: 8.1.4.Final
StateTransferManager should be the first component to stop
----------------------------------------------------------
Key: ISPN-6341
URL:
https://issues.jboss.org/browse/ISPN-6341
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.2.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 8.2.1.Final, 9.0.0.Alpha1, 8.1.4.Final
When a cache stops, it first removes the component registry from the
{{GlobalComponentsRegistry}}'s {{namedComponents}} map, which means the node
(let's call it {{A}}) will reply with a {{CacheNotFoundResponse}} to any remote
command.
Another node {{B}} trying to execute a write/transactional command will receive the
{{CacheNotFoundResponse}}, assume that a new cache topology with id {{current topology id
+ 1}} is coming soon, and wait for that new topology before retrying.
Normally this is not a problem, because {{StateTransferManagerImpl.stop()}} sends a
{{CacheTopologyControlCommand(LEAVE)}} to the coordinator quickly enough, then {{B}}
receives the {{current topology id + 1}} topology and retries the command.
But in some cases, the cache components that stop before {{StateTransferManagerImpl}} can
take a long time to do so. In particular, because of {{ISPN-5507}}, {{TransactionTable}}
can block for {{cacheStopTimeout}} if there are remote transactions in progress, even
though the cache can no longer process remote commands.
We should give {{StateTransferManagerImpl.stop()}} a priority of {{0}}, so that the
{{CacheTopologyControlCommand(LEAVE)}} comand is sent as soon as possible.