At the moment, the cluster health status checker enumerates all caches in
the cache manager [1] and checks whether those cashes are running and not
in degraded more [2].
I'm not sure how counter caches have been implemented. One thing is for
sure - they should be taken into account in this loop [3].
[1]
On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegismont(a)gmail.com>
wrote:
2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pedro(a)infinispan.org>:
>
>
> On 23-03-2018 15:06, Thomas SEGISMONT wrote:
> > Hi Pedro,
> >
> > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pedro(a)infinispan.org
> > <mailto:pedro@infinispan.org>>:
> >
> > Hi Thomas,
> >
> > Is the test in question using any counter/lock?
> >
> >
> > I have seen the problem on a test for counters, on another one for
> > locks, as well as well as caches only.
> > But Vert.x starts the ClusteredLockManager and the CounterManager in all
> > cases (even if no lock/counter is created/used)
> >
> >
> > I did see similar behavior with the counter's in our server test
> suite.
> > The partition handling makes the cache degraded because nodes are
> > starting and stopping concurrently.
> >
> >
> > As for me I was able to observe the problem even when stopping nodes one
> > after the other and waiting for cluster to go back to HEALTHY status.
> > Is it possible that the status of the counter and lock caches are not
> > taken into account in cluster health?
>
> The counter and lock caches are private. So, they aren't in the cluster
> health neither their name are returned by getCacheNames() method.
>
Thanks for the details.
I'm not concerned with these internal caches not being listed when calling
getCacheNames.
However, the cluster health status should include their status as well.
Cluster status testing is the recommended way to implement readiness
checks on Kubernetes for example.
What do you think Sebastian?
>
> >
> >
> > I'm not sure if there are any JIRA to tracking. Ryan, Dan do you
> know?
> > If there is none, it should be created.
> >
> > I improved the counters by making the cache start lazily when you
> first
> > get or define a counter [1]. This workaround solved the issue for
> us.
> >
> > As a workaround for your test suite, I suggest to make sure the
> caches
> > (___counter_configuration and org.infinispan.LOCK) have finished
> their
> > state transfer before stopping the cache managers, by invoking
> > DefaultCacheManager.getCache(*cache-name*) in all the caches
> managers.
> >
> > Sorry for the inconvenience and the delay in replying.
> >
> >
> > No problem.
> >
> >
> > Cheers,
> > Pedro
> >
> > [1]
https://issues.jboss.org/browse/ISPN-8860
> > <
https://issues.jboss.org/browse/ISPN-8860>
> >
> > On 21-03-2018 16:16, Thomas SEGISMONT wrote:
> > > Hi everyone,
> > >
> > > I am working on integrating Infinispan 9.2.Final in
> vertx-infinispan.
> > > Before merging I wanted to make sure the test suite passed but it
> > > doesn't. It's not the always the same test involved.
> > >
> > > In the logs, I see a lot of messages like "After merge (or
> > coordinator
> > > change), cache still hasn't recovered a majority of members and
> must
> > > stay in degraded mode.
> > > The context involved are "___counter_configuration" and
> > > "org.infinispan.LOCKS"
> > >
> > > Most often it's harmless but, sometimes, I also see this
> exception
> > > "ISPN000210: Failed to request state of cache"
> > > Again the cache involved is either
"___counter_configuration" or
> > > "org.infinispan.LOCKS"
> > > After this exception, the cache manager is unable to stop. It
> > blocks in
> > > method "terminate" (join on cache future).
> > >
> > > I thought the test suite was too rough (we stop all nodes at the
> same
> > > time). So I changed it to make sure that:
> > > - nodes start one after the other
> > > - a new node is started only when the previous one indicates
> > HEALTHY status
> > > - nodes stop one after the other
> > > - a node is stopped only when it indicates HEALTHY status
> > > Pretty much what we do on Kubernetes for the readiness check
> > actually.
> > > But it didn't get any better.
> > >
> > > Attached are the logs of such a failing test.
> > >
> > > Note that the Vert.x test itself does not fail, it's only when
> > closing
> > > nodes that we have issues.
> > >
> > > Here's our XML config:
> > >
> >
>
https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resource...
> > <
>
https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resource...
> >
> > >
> > > Does that ring a bell? Do you need more info?
> > >
> > > Regards,
> > > Thomas
> > >
> > >
> > >
> > > _______________________________________________
> > > infinispan-dev mailing list
> > > infinispan-dev(a)lists.jboss.org
> > <mailto:infinispan-dev@lists.jboss.org>
> > >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > <
https://lists.jboss.org/mailman/listinfo/infinispan-dev>
> > >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev(a)lists.jboss.org <mailto:
> infinispan-dev(a)lists.jboss.org>
> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > <
https://lists.jboss.org/mailman/listinfo/infinispan-dev>
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev(a)lists.jboss.org
> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev