[JBoss JIRA] (ISPN-8713) "Initial state transfer timed out" in border case

Thursday, 13 December 2018

     [
https://issues.jboss.org/browse/ISPN-8713?page=com.atlassian.jira.plugin....
]

Walter Pongratz closed ISPN-8713.
---------------------------------
    Resolution: Done

Just retestet it - in Version  9.4.4.Final this issue does not occure anymore

...
 "Initial state transfer timed out" in border case
 -------------------------------------------------

                 Key: ISPN-8713
                 URL: https://issues.jboss.org/browse/ISPN-8713
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 9.1.4.Final
            Reporter: Walter Pongratz
            Priority: Critical

 There is a bug in Infinispan 9.1.4 in a cluster where not every cache ist started on each
node. For detailed steps to reproduce see below, short summary: If a cache is not started
on the coordinator and the last node is not leaving gracefully that cache can not be
started again until the coordinator changes.
 Known aspects:
 * In Infinispan 8.2.6 the same steps do NOT lead to an issue
 Likely cause: On a non-graceful exit of the last node with a certain cache the cache is
not removed from the cacheStatusMap in ClusterTopologyManagerImpl on the coordinator, the
ClusterCacheStatus is only manipulated - See TODO in
ClusterCacheStatus.updateCurrentTopology(). On a graceful exit this is done - see
ClusterTopologyManagerImpl.handleLeave().
 Now somehow a new Node starting the Cache in question waits for the initial state
transfer - which never happends because there is no other node with this cache. In
Infinispan 8.2.6. this seemed not to be a problem - but in 9.1.4 it is. The fix then would
be to either fix this Todo and remove the ClusterCacheStatus from the map OR fix that the
new node is not waiting for initial state transfer in this case.
 I set the priority to critical because of the difficulty in fixing this in a production
envrionment: Once this problem happens the cache can not be started ON ANY NODE OF THE
CLUSTER until the coordinator is changed. If a new Node does not start operations personel
would assume a problem with that node and try to restart it. 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009