[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Tue Mar 27 04:16:44 EDT 2018

Thanks Sebastian. Is there a JIRA for this already?

2018-03-27 10:03 GMT+02:00 Sebastian Laskawiec <slaskawi at redhat.com>:

> At the moment, the cluster health status checker enumerates all caches in
> the cache manager [1] and checks whether those cashes are running and not
> in degraded more [2].
>
> I'm not sure how counter caches have been implemented. One thing is for
> sure - they should be taken into account in this loop [3].
>
> [1] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> ClusterHealthImpl.java#L22
> [2] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> CacheHealthImpl.java#L25
> [3] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> ClusterHealthImpl.java#L23-L24
>
> On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegismont at gmail.com>
> wrote:
>
>> 2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pedro at infinispan.org>:
>>
>>>
>>>
>>> On 23-03-2018 15:06, Thomas SEGISMONT wrote:
>>> > Hi Pedro,
>>> >
>>> > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro at infinispan.org
>>> > <mailto:pedro at infinispan.org>>:
>>> >
>>> >     Hi Thomas,
>>> >
>>> >     Is the test in question using any counter/lock?
>>> >
>>> >
>>> > I have seen the problem on a test for counters, on another one for
>>> > locks, as well as well as caches only.
>>> > But Vert.x starts the ClusteredLockManager and the CounterManager in
>>> all
>>> > cases (even if no lock/counter is created/used)
>>> >
>>> >
>>> >     I did see similar behavior with the counter's in our server test
>>> suite.
>>> >     The partition handling makes the cache degraded because nodes are
>>> >     starting and stopping concurrently.
>>> >
>>> >
>>> > As for me I was able to observe the problem even when stopping nodes
>>> one
>>> > after the other and waiting for cluster to go back to HEALTHY status.
>>> > Is it possible that the status of the counter and lock caches are not
>>> > taken into account in cluster health?
>>>
>>> The counter and lock caches are private. So, they aren't in the cluster
>>> health neither their name are returned by getCacheNames() method.
>>>
>>
>> Thanks for the details.
>>
>> I'm not concerned with these internal caches not being listed when
>> calling getCacheNames.
>>
>> However, the cluster health status should include their status as well.
>> Cluster status testing is the recommended way to implement readiness
>> checks on Kubernetes for example.
>>
>> What do you think Sebastian?
>>
>>
>>>
>>> >
>>> >
>>> >     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you
>>> know?
>>> >     If there is none, it should be created.
>>> >
>>> >     I improved the counters by making the cache start lazily when you
>>> first
>>> >     get or define a counter [1]. This workaround solved the issue for
>>> us.
>>> >
>>> >     As a workaround for your test suite, I suggest to make sure the
>>> caches
>>> >     (___counter_configuration and org.infinispan.LOCK) have finished
>>> their
>>> >     state transfer before stopping the cache managers, by invoking
>>> >     DefaultCacheManager.getCache(*cache-name*) in all the caches
>>> managers.
>>> >
>>> >     Sorry for the inconvenience and the delay in replying.
>>> >
>>> >
>>> > No problem.
>>> >
>>> >
>>> >     Cheers,
>>> >     Pedro
>>> >
>>> >     [1] https://issues.jboss.org/browse/ISPN-8860
>>> >     <https://issues.jboss.org/browse/ISPN-8860>
>>> >
>>> >     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>>> >      > Hi everyone,
>>> >      >
>>> >      > I am working on integrating Infinispan 9.2.Final in
>>> vertx-infinispan.
>>> >      > Before merging I wanted to make sure the test suite passed but
>>> it
>>> >      > doesn't. It's not the always the same test involved.
>>> >      >
>>> >      > In the logs, I see a lot of messages like "After merge (or
>>> >     coordinator
>>> >      > change), cache still hasn't recovered a majority of members and
>>> must
>>> >      > stay in degraded mode.
>>> >      > The context involved are "___counter_configuration" and
>>> >      > "org.infinispan.LOCKS"
>>> >      >
>>> >      > Most often it's harmless but, sometimes, I also see this
>>> exception
>>> >      > "ISPN000210: Failed to request state of cache"
>>> >      > Again the cache involved is either "___counter_configuration" or
>>> >      > "org.infinispan.LOCKS"
>>> >      > After this exception, the cache manager is unable to stop. It
>>> >     blocks in
>>> >      > method "terminate" (join on cache future).
>>> >      >
>>> >      > I thought the test suite was too rough (we stop all nodes at
>>> the same
>>> >      > time). So I changed it to make sure that:
>>> >      > - nodes start one after the other
>>> >      > - a new node is started only when the previous one indicates
>>> >     HEALTHY status
>>> >      > - nodes stop one after the other
>>> >      > - a node is stopped only when it indicates HEALTHY status
>>> >      > Pretty much what we do on Kubernetes for the readiness check
>>> >     actually.
>>> >      > But it didn't get any better.
>>> >      >
>>> >      > Attached are the logs of such a failing test.
>>> >      >
>>> >      > Note that the Vert.x test itself does not fail, it's only when
>>> >     closing
>>> >      > nodes that we have issues.
>>> >      >
>>> >      > Here's our XML config:
>>> >      >
>>> >     https://github.com/vert-x3/vertx-infinispan/blob/ispn92/
>>> src/main/resources/default-infinispan.xml
>>> >     <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/
>>> src/main/resources/default-infinispan.xml>
>>> >      >
>>> >      > Does that ring a bell? Do you need more info?
>>> >      >
>>> >      > Regards,
>>> >      > Thomas
>>> >      >
>>> >      >
>>> >      >
>>> >      > _______________________________________________
>>> >      > infinispan-dev mailing list
>>> >      > infinispan-dev at lists.jboss.org
>>> >     <mailto:infinispan-dev at lists.jboss.org>
>>> >      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>> >      >
>>> >     _______________________________________________
>>> >     infinispan-dev mailing list
>>> >     infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.
>>> jboss.org>
>>> >     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > infinispan-dev mailing list
>>> > infinispan-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20180327/82abbd65/attachment.html