[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown
Thomas SEGISMONT
tsegismont at gmail.com
Tue Mar 27 04:16:44 EDT 2018
Thanks Sebastian. Is there a JIRA for this already?
2018-03-27 10:03 GMT+02:00 Sebastian Laskawiec <slaskawi at redhat.com>:
> At the moment, the cluster health status checker enumerates all caches in
> the cache manager [1] and checks whether those cashes are running and not
> in degraded more [2].
>
> I'm not sure how counter caches have been implemented. One thing is for
> sure - they should be taken into account in this loop [3].
>
> [1] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> ClusterHealthImpl.java#L22
> [2] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> CacheHealthImpl.java#L25
> [3] https://github.com/infinispan/infinispan/blob/
> master/core/src/main/java/org/infinispan/health/impl/
> ClusterHealthImpl.java#L23-L24
>
> On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegismont at gmail.com>
> wrote:
>
>> 2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pedro at infinispan.org>:
>>
>>>
>>>
>>> On 23-03-2018 15:06, Thomas SEGISMONT wrote:
>>> > Hi Pedro,
>>> >
>>> > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro at infinispan.org
>>> > <mailto:pedro at infinispan.org>>:
>>> >
>>> > Hi Thomas,
>>> >
>>> > Is the test in question using any counter/lock?
>>> >
>>> >
>>> > I have seen the problem on a test for counters, on another one for
>>> > locks, as well as well as caches only.
>>> > But Vert.x starts the ClusteredLockManager and the CounterManager in
>>> all
>>> > cases (even if no lock/counter is created/used)
>>> >
>>> >
>>> > I did see similar behavior with the counter's in our server test
>>> suite.
>>> > The partition handling makes the cache degraded because nodes are
>>> > starting and stopping concurrently.
>>> >
>>> >
>>> > As for me I was able to observe the problem even when stopping nodes
>>> one
>>> > after the other and waiting for cluster to go back to HEALTHY status.
>>> > Is it possible that the status of the counter and lock caches are not
>>> > taken into account in cluster health?
>>>
>>> The counter and lock caches are private. So, they aren't in the cluster
>>> health neither their name are returned by getCacheNames() method.
>>>
>>
>> Thanks for the details.
>>
>> I'm not concerned with these internal caches not being listed when
>> calling getCacheNames.
>>
>> However, the cluster health status should include their status as well.
>> Cluster status testing is the recommended way to implement readiness
>> checks on Kubernetes for example.
>>
>> What do you think Sebastian?
>>
>>
>>>
>>> >
>>> >
>>> > I'm not sure if there are any JIRA to tracking. Ryan, Dan do you
>>> know?
>>> > If there is none, it should be created.
>>> >
>>> > I improved the counters by making the cache start lazily when you
>>> first
>>> > get or define a counter [1]. This workaround solved the issue for
>>> us.
>>> >
>>> > As a workaround for your test suite, I suggest to make sure the
>>> caches
>>> > (___counter_configuration and org.infinispan.LOCK) have finished
>>> their
>>> > state transfer before stopping the cache managers, by invoking
>>> > DefaultCacheManager.getCache(*cache-name*) in all the caches
>>> managers.
>>> >
>>> > Sorry for the inconvenience and the delay in replying.
>>> >
>>> >
>>> > No problem.
>>> >
>>> >
>>> > Cheers,
>>> > Pedro
>>> >
>>> > [1] https://issues.jboss.org/browse/ISPN-8860
>>> > <https://issues.jboss.org/browse/ISPN-8860>
>>> >
>>> > On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>>> > > Hi everyone,
>>> > >
>>> > > I am working on integrating Infinispan 9.2.Final in
>>> vertx-infinispan.
>>> > > Before merging I wanted to make sure the test suite passed but
>>> it
>>> > > doesn't. It's not the always the same test involved.
>>> > >
>>> > > In the logs, I see a lot of messages like "After merge (or
>>> > coordinator
>>> > > change), cache still hasn't recovered a majority of members and
>>> must
>>> > > stay in degraded mode.
>>> > > The context involved are "___counter_configuration" and
>>> > > "org.infinispan.LOCKS"
>>> > >
>>> > > Most often it's harmless but, sometimes, I also see this
>>> exception
>>> > > "ISPN000210: Failed to request state of cache"
>>> > > Again the cache involved is either "___counter_configuration" or
>>> > > "org.infinispan.LOCKS"
>>> > > After this exception, the cache manager is unable to stop. It
>>> > blocks in
>>> > > method "terminate" (join on cache future).
>>> > >
>>> > > I thought the test suite was too rough (we stop all nodes at
>>> the same
>>> > > time). So I changed it to make sure that:
>>> > > - nodes start one after the other
>>> > > - a new node is started only when the previous one indicates
>>> > HEALTHY status
>>> > > - nodes stop one after the other
>>> > > - a node is stopped only when it indicates HEALTHY status
>>> > > Pretty much what we do on Kubernetes for the readiness check
>>> > actually.
>>> > > But it didn't get any better.
>>> > >
>>> > > Attached are the logs of such a failing test.
>>> > >
>>> > > Note that the Vert.x test itself does not fail, it's only when
>>> > closing
>>> > > nodes that we have issues.
>>> > >
>>> > > Here's our XML config:
>>> > >
>>> > https://github.com/vert-x3/vertx-infinispan/blob/ispn92/
>>> src/main/resources/default-infinispan.xml
>>> > <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/
>>> src/main/resources/default-infinispan.xml>
>>> > >
>>> > > Does that ring a bell? Do you need more info?
>>> > >
>>> > > Regards,
>>> > > Thomas
>>> > >
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > infinispan-dev mailing list
>>> > > infinispan-dev at lists.jboss.org
>>> > <mailto:infinispan-dev at lists.jboss.org>
>>> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> > <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>> > >
>>> > _______________________________________________
>>> > infinispan-dev mailing list
>>> > infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.
>>> jboss.org>
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> > <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > infinispan-dev mailing list
>>> > infinispan-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20180327/82abbd65/attachment.html
More information about the infinispan-dev
mailing list