[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Tue Mar 27 04:03:08 EDT 2018

At the moment, the cluster health status checker enumerates all caches in
the cache manager [1] and checks whether those cashes are running and not
in degraded more [2].

I'm not sure how counter caches have been implemented. One thing is for
sure - they should be taken into account in this loop [3].

[1]
https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
[2]
https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
[3]
https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24

On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegismont at gmail.com>
wrote:

> 2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pedro at infinispan.org>:
>
>>
>>
>> On 23-03-2018 15:06, Thomas SEGISMONT wrote:
>> > Hi Pedro,
>> >
>> > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro at infinispan.org
>> > <mailto:pedro at infinispan.org>>:
>> >
>> >     Hi Thomas,
>> >
>> >     Is the test in question using any counter/lock?
>> >
>> >
>> > I have seen the problem on a test for counters, on another one for
>> > locks, as well as well as caches only.
>> > But Vert.x starts the ClusteredLockManager and the CounterManager in all
>> > cases (even if no lock/counter is created/used)
>> >
>> >
>> >     I did see similar behavior with the counter's in our server test
>> suite.
>> >     The partition handling makes the cache degraded because nodes are
>> >     starting and stopping concurrently.
>> >
>> >
>> > As for me I was able to observe the problem even when stopping nodes one
>> > after the other and waiting for cluster to go back to HEALTHY status.
>> > Is it possible that the status of the counter and lock caches are not
>> > taken into account in cluster health?
>>
>> The counter and lock caches are private. So, they aren't in the cluster
>> health neither their name are returned by getCacheNames() method.
>>
>
> Thanks for the details.
>
> I'm not concerned with these internal caches not being listed when calling
> getCacheNames.
>
> However, the cluster health status should include their status as well.
> Cluster status testing is the recommended way to implement readiness
> checks on Kubernetes for example.
>
> What do you think Sebastian?
>
>
>>
>> >
>> >
>> >     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you
>> know?
>> >     If there is none, it should be created.
>> >
>> >     I improved the counters by making the cache start lazily when you
>> first
>> >     get or define a counter [1]. This workaround solved the issue for
>> us.
>> >
>> >     As a workaround for your test suite, I suggest to make sure the
>> caches
>> >     (___counter_configuration and org.infinispan.LOCK) have finished
>> their
>> >     state transfer before stopping the cache managers, by invoking
>> >     DefaultCacheManager.getCache(*cache-name*) in all the caches
>> managers.
>> >
>> >     Sorry for the inconvenience and the delay in replying.
>> >
>> >
>> > No problem.
>> >
>> >
>> >     Cheers,
>> >     Pedro
>> >
>> >     [1] https://issues.jboss.org/browse/ISPN-8860
>> >     <https://issues.jboss.org/browse/ISPN-8860>
>> >
>> >     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>> >      > Hi everyone,
>> >      >
>> >      > I am working on integrating Infinispan 9.2.Final in
>> vertx-infinispan.
>> >      > Before merging I wanted to make sure the test suite passed but it
>> >      > doesn't. It's not the always the same test involved.
>> >      >
>> >      > In the logs, I see a lot of messages like "After merge (or
>> >     coordinator
>> >      > change), cache still hasn't recovered a majority of members and
>> must
>> >      > stay in degraded mode.
>> >      > The context involved are "___counter_configuration" and
>> >      > "org.infinispan.LOCKS"
>> >      >
>> >      > Most often it's harmless but, sometimes, I also see this
>> exception
>> >      > "ISPN000210: Failed to request state of cache"
>> >      > Again the cache involved is either "___counter_configuration" or
>> >      > "org.infinispan.LOCKS"
>> >      > After this exception, the cache manager is unable to stop. It
>> >     blocks in
>> >      > method "terminate" (join on cache future).
>> >      >
>> >      > I thought the test suite was too rough (we stop all nodes at the
>> same
>> >      > time). So I changed it to make sure that:
>> >      > - nodes start one after the other
>> >      > - a new node is started only when the previous one indicates
>> >     HEALTHY status
>> >      > - nodes stop one after the other
>> >      > - a node is stopped only when it indicates HEALTHY status
>> >      > Pretty much what we do on Kubernetes for the readiness check
>> >     actually.
>> >      > But it didn't get any better.
>> >      >
>> >      > Attached are the logs of such a failing test.
>> >      >
>> >      > Note that the Vert.x test itself does not fail, it's only when
>> >     closing
>> >      > nodes that we have issues.
>> >      >
>> >      > Here's our XML config:
>> >      >
>> >
>> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>> >     <
>> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>> >
>> >      >
>> >      > Does that ring a bell? Do you need more info?
>> >      >
>> >      > Regards,
>> >      > Thomas
>> >      >
>> >      >
>> >      >
>> >      > _______________________________________________
>> >      > infinispan-dev mailing list
>> >      > infinispan-dev at lists.jboss.org
>> >     <mailto:infinispan-dev at lists.jboss.org>
>> >      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>> >      >
>> >     _______________________________________________
>> >     infinispan-dev mailing list
>> >     infinispan-dev at lists.jboss.org <mailto:
>> infinispan-dev at lists.jboss.org>
>> >     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20180327/2bd892a6/attachment-0001.html