]
Walter Pongratz closed ISPN-8713.
---------------------------------
Resolution: Done
Just retestet it - in Version 9.4.4.Final this issue does not occure anymore
"Initial state transfer timed out" in border case
-------------------------------------------------
Key: ISPN-8713
URL:
https://issues.jboss.org/browse/ISPN-8713
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.1.4.Final
Reporter: Walter Pongratz
Priority: Critical
There is a bug in Infinispan 9.1.4 in a cluster where not every cache ist started on each
node. For detailed steps to reproduce see below, short summary: If a cache is not started
on the coordinator and the last node is not leaving gracefully that cache can not be
started again until the coordinator changes.
Known aspects:
* In Infinispan 8.2.6 the same steps do NOT lead to an issue
Likely cause: On a non-graceful exit of the last node with a certain cache the cache is
not removed from the cacheStatusMap in ClusterTopologyManagerImpl on the coordinator, the
ClusterCacheStatus is only manipulated - See TODO in
ClusterCacheStatus.updateCurrentTopology(). On a graceful exit this is done - see
ClusterTopologyManagerImpl.handleLeave().
Now somehow a new Node starting the Cache in question waits for the initial state
transfer - which never happends because there is no other node with this cache. In
Infinispan 8.2.6. this seemed not to be a problem - but in 9.1.4 it is. The fix then would
be to either fix this Todo and remove the ClusterCacheStatus from the map OR fix that the
new node is not waiting for initial state transfer in this case.
I set the priority to critical because of the difficulty in fixing this in a production
envrionment: Once this problem happens the cache can not be started ON ANY NODE OF THE
CLUSTER until the coordinator is changed. If a new Node does not start operations personel
would assume a problem with that node and try to restart it.