[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Tue Mar 27 05:08:21 EDT 2018

On 27-03-2018 09:03, Sebastian Laskawiec wrote:
> At the moment, the cluster health status checker enumerates all caches 
> in the cache manager [1] and checks whether those cashes are running and 
> not in degraded more [2].
> 
> I'm not sure how counter caches have been implemented. One thing is for 
> sure - they should be taken into account in this loop [3].

The private caches aren't listed by CacheManager.getCacheNames(). We 
have to check them via InternalCacheRegistry.getInternalCacheNames().

I'll open a JIRA if you don't mind :)

> 
> [1] 
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
> [2] 
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
> [3] 
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
> 
> On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegismont at gmail.com 
> <mailto:tsegismont at gmail.com>> wrote:
> 
>     2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pedro at infinispan.org
>     <mailto:pedro at infinispan.org>>:
> 
> 
> 
>         On 23-03-2018 15:06, Thomas SEGISMONT wrote:
>         > Hi Pedro,
>         >
>         > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro at infinispan.org <mailto:pedro at infinispan.org>
>          > <mailto:pedro at infinispan.org <mailto:pedro at infinispan.org>>>:
>         >
>         >     Hi Thomas,
>         >
>         >     Is the test in question using any counter/lock?
>         >
>         >
>         > I have seen the problem on a test for counters, on another one for
>         > locks, as well as well as caches only.
>         > But Vert.x starts the ClusteredLockManager and the CounterManager in all
>         > cases (even if no lock/counter is created/used)
>         >
>         >
>         >     I did see similar behavior with the counter's in our server test suite.
>         >     The partition handling makes the cache degraded because nodes are
>         >     starting and stopping concurrently.
>         >
>         >
>         > As for me I was able to observe the problem even when stopping nodes one
>         > after the other and waiting for cluster to go back to HEALTHY status.
>         > Is it possible that the status of the counter and lock caches are not
>         > taken into account in cluster health?
> 
>         The counter and lock caches are private. So, they aren't in the
>         cluster
>         health neither their name are returned by getCacheNames() method.
> 
> 
>     Thanks for the details.
> 
>     I'm not concerned with these internal caches not being listed when
>     calling getCacheNames.
> 
>     However, the cluster health status should include their status as well.
>     Cluster status testing is the recommended way to implement readiness
>     checks on Kubernetes for example.
> 
>     What do you think Sebastian?
> 
> 
>          >
>          >
>          >     I'm not sure if there are any JIRA to tracking. Ryan, Dan
>         do you know?
>          >     If there is none, it should be created.
>          >
>          >     I improved the counters by making the cache start lazily
>         when you first
>          >     get or define a counter [1]. This workaround solved the
>         issue for us.
>          >
>          >     As a workaround for your test suite, I suggest to make
>         sure the caches
>          >     (___counter_configuration and org.infinispan.LOCK) have
>         finished their
>          >     state transfer before stopping the cache managers, by
>         invoking
>          >     DefaultCacheManager.getCache(*cache-name*) in all the
>         caches managers.
>          >
>          >     Sorry for the inconvenience and the delay in replying.
>          >
>          >
>          > No problem.
>          >
>          >
>          >     Cheers,
>          >     Pedro
>          >
>          >     [1] https://issues.jboss.org/browse/ISPN-8860
>          >     <https://issues.jboss.org/browse/ISPN-8860>
>          >
>          >     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>          >      > Hi everyone,
>          >      >
>          >      > I am working on integrating Infinispan 9.2.Final in
>         vertx-infinispan.
>          >      > Before merging I wanted to make sure the test suite
>         passed but it
>          >      > doesn't. It's not the always the same test involved.
>          >      >
>          >      > In the logs, I see a lot of messages like "After merge (or
>          >     coordinator
>          >      > change), cache still hasn't recovered a majority of
>         members and must
>          >      > stay in degraded mode.
>          >      > The context involved are "___counter_configuration" and
>          >      > "org.infinispan.LOCKS"
>          >      >
>          >      > Most often it's harmless but, sometimes, I also see
>         this exception
>          >      > "ISPN000210: Failed to request state of cache"
>          >      > Again the cache involved is either
>         "___counter_configuration" or
>          >      > "org.infinispan.LOCKS"
>          >      > After this exception, the cache manager is unable to
>         stop. It
>          >     blocks in
>          >      > method "terminate" (join on cache future).
>          >      >
>          >      > I thought the test suite was too rough (we stop all
>         nodes at the same
>          >      > time). So I changed it to make sure that:
>          >      > - nodes start one after the other
>          >      > - a new node is started only when the previous one
>         indicates
>          >     HEALTHY status
>          >      > - nodes stop one after the other
>          >      > - a node is stopped only when it indicates HEALTHY status
>          >      > Pretty much what we do on Kubernetes for the readiness
>         check
>          >     actually.
>          >      > But it didn't get any better.
>          >      >
>          >      > Attached are the logs of such a failing test.
>          >      >
>          >      > Note that the Vert.x test itself does not fail, it's
>         only when
>          >     closing
>          >      > nodes that we have issues.
>          >      >
>          >      > Here's our XML config:
>          >      >
>          >
>         https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>          >   
>           <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>          >      >
>          >      > Does that ring a bell? Do you need more info?
>          >      >
>          >      > Regards,
>          >      > Thomas
>          >      >
>          >      >
>          >      >
>          >      > _______________________________________________
>          >      > infinispan-dev mailing list
>          >      > infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>
>          >     <mailto:infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>>
>          >      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>         >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>         >      >
>         >     _______________________________________________
>         >     infinispan-dev mailing list
>          > infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>
>         <mailto:infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>>
>          > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>          >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>          >
>          >
>          >
>          >
>          > _______________________________________________
>          > infinispan-dev mailing list
>          > infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>
>          > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>          >
>         _______________________________________________
>         infinispan-dev mailing list
>         infinispan-dev at lists.jboss.org
>         <mailto:infinispan-dev at lists.jboss.org>
>         https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
>     _______________________________________________
>     infinispan-dev mailing list
>     infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>