[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Fri Mar 23 11:06:37 EDT 2018

Hi Pedro,

2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro at infinispan.org>:

> Hi Thomas,
>
> Is the test in question using any counter/lock?
>

I have seen the problem on a test for counters, on another one for locks,
as well as well as caches only.
But Vert.x starts the ClusteredLockManager and the CounterManager in all
cases (even if no lock/counter is created/used)

>
> I did see similar behavior with the counter's in our server test suite.
> The partition handling makes the cache degraded because nodes are
> starting and stopping concurrently.
>

As for me I was able to observe the problem even when stopping nodes one
after the other and waiting for cluster to go back to HEALTHY status.
Is it possible that the status of the counter and lock caches are not taken
into account in cluster health?

>
> I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
> If there is none, it should be created.
>
> I improved the counters by making the cache start lazily when you first
> get or define a counter [1]. This workaround solved the issue for us.
>
> As a workaround for your test suite, I suggest to make sure the caches
> (___counter_configuration and org.infinispan.LOCK) have finished their
> state transfer before stopping the cache managers, by invoking
> DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
> Sorry for the inconvenience and the delay in replying.
>

No problem.

>
> Cheers,
> Pedro
>
> [1] https://issues.jboss.org/browse/ISPN-8860
>
> On 21-03-2018 16:16, Thomas SEGISMONT wrote:
> > Hi everyone,
> >
> > I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
> > Before merging I wanted to make sure the test suite passed but it
> > doesn't. It's not the always the same test involved.
> >
> > In the logs, I see a lot of messages like "After merge (or coordinator
> > change), cache still hasn't recovered a majority of members and must
> > stay in degraded mode.
> > The context involved are "___counter_configuration" and
> > "org.infinispan.LOCKS"
> >
> > Most often it's harmless but, sometimes, I also see this exception
> > "ISPN000210: Failed to request state of cache"
> > Again the cache involved is either "___counter_configuration" or
> > "org.infinispan.LOCKS"
> > After this exception, the cache manager is unable to stop. It blocks in
> > method "terminate" (join on cache future).
> >
> > I thought the test suite was too rough (we stop all nodes at the same
> > time). So I changed it to make sure that:
> > - nodes start one after the other
> > - a new node is started only when the previous one indicates HEALTHY
> status
> > - nodes stop one after the other
> > - a node is stopped only when it indicates HEALTHY status
> > Pretty much what we do on Kubernetes for the readiness check actually.
> > But it didn't get any better.
> >
> > Attached are the logs of such a failing test.
> >
> > Note that the Vert.x test itself does not fail, it's only when closing
> > nodes that we have issues.
> >
> > Here's our XML config:
> > https://github.com/vert-x3/vertx-infinispan/blob/ispn92/
> src/main/resources/default-infinispan.xml
> >
> > Does that ring a bell? Do you need more info?
> >
> > Regards,
> > Thomas
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20180323/04d25941/attachment.html