[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

William Burns mudokonman at gmail.com
Fri Apr 20 14:10:02 EDT 2018


On Fri, Apr 20, 2018 at 9:43 AM Thomas SEGISMONT <tsegismont at gmail.com>
wrote:

> I tried our test suite on a slower machine (iMac from 2011). It passes
> consistently there.
>
> On my laptop, I keep seeing this from time to time (in different tests):
>
> 2018-04-19T19:53:09.513 WARN [Context=org.infinispan.LOCKS]ISPN000320:
> After merge (or coordinator change), cache still hasn't recovered a
> majority of members and must stay in degraded mode. Current members are
> [sombrero-19385], lost members are [sombrero-42917], stable members are
> [sombrero-42917, sombrero-19385]
>

I would expect the nodes to be leaving gracefully, which shouldn't cause a
merge. I am not sure how your test is producing that. Can you produce a
TRACE log and a JIRA for it?

However if there is a merge, if you go down a single node it will always be
in DEGRADED mode, when using partition handling. This is due to not having
a simple majority as described in
http://infinispan.org/docs/stable/user_guide/user_guide.html#partition_handling


>
> It happens when we shutdown nodes one after the other (even when waiting
> for cluster status to be "healthy" plus extra 2 seconds).
>
> After that the nodes remains blocked in DefaultCacheManager.stop
>
> 2018-04-19T19:49:29.242 AVERTISSEMENT Thread
> Thread[vert.x-worker-thread-5,5,main] has been blocked for 60774 ms, time
> limit is 60000
> io.vertx.core.VertxException: Thread blocked
>     at sun.misc.Unsafe.park(Native Method)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>     at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>     at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>     at
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
>     at
> org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688)
>     at
> org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734)
>     at
> org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711)
>     at
> io.vertx.ext.cluster.infinispan.InfinispanClusterManager.lambda$leave$5(InfinispanClusterManager.java:285)
>     at
> io.vertx.ext.cluster.infinispan.InfinispanClusterManager$$Lambda$421/578931659.handle(Unknown
> Source)
>     at
> io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:265)
>     at io.vertx.core.impl.ContextImpl$$Lambda$27/1330754528.run(Unknown
> Source)
>
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.lang.Thread.run(Thread.java:748)
>

This looks like the exact issue that Radim mentioned with
https://issues.jboss.org/browse/ISPN-8859.


>
>
>
> 2018-04-18 17:00 GMT+02:00 Thomas SEGISMONT <tsegismont at gmail.com>:
>
>> So here's the Circular Referenced Suppressed Exception
>>
>> [stateTransferExecutor-thread--p221-t33] 2018-04-18T16:15:06.662+02:00
>> WARN [org.infinispan.statetransfer.InboundTransferTask]  ISPN000210: Failed
>> to request state of cache __vertx.subs from node sombrero-25286, segments
>> {0}
>> org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400:
>> Node sombrero-25286 was suspected
>>     at
>> org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
>>     at
>> org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
>>     at
>> org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
>>     at
>> org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
>>     at
>> org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
>>     at
>> org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
>>     at
>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.addRequest(JGroupsTransport.java:921)
>>     at
>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:815)
>>     at
>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:123)
>>     at
>> org.infinispan.remoting.rpc.RpcManagerImpl.invokeCommand(RpcManagerImpl.java:138)
>>     at
>> org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
>>     at
>> org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
>>     at
>> org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$2(StateReceiverImpl.java:164)
>>     at
>> org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101)
>>     at
>> org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
>>     at
>> org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
>>     at
>> org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>     at java.lang.Thread.run(Thread.java:748)
>>     Suppressed: java.util.concurrent.ExecutionException:
>> org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400:
>> Node sombrero-25286 was suspected
>>         at
>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>>         at
>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>>         at
>> org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
>>         at
>> org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:260)
>>         ... 10 more
>>     [CIRCULAR
>> REFERENCE:org.infinispan.remoting.transport.jgroups.SuspectException:
>> ISPN000400: Node sombrero-25286 was suspected]
>>
>> It does not happen with 9.2.0.Final and prevents from using ISPN embedded
>> with logback. Do you want me to file an issue ?
>>
>> 2018-04-18 11:45 GMT+02:00 Thomas SEGISMONT <tsegismont at gmail.com>:
>>
>>> Hi folks,
>>>
>>> Sorry I've been busy on other things and couldn't get back to you
>>> earlier.
>>>
>>> I tried running vertx-infinispan test suite with 9.2.1.Final today.
>>> There are some problems still but I can't say which ones yet because I hit:
>>> https://jira.qos.ch/browse/LOGBACK-1027
>>>
>>> We use logback for test logs and all I get is:
>>>
>>> 2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR
>>> o.i.executors.LimitedExecutor - Exception in task
>>> java.lang.StackOverflowError: null
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
>>>     at
>>> ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
>>> ... so on so forth
>>>
>>> I will run the suite again without logback and tell you what the actual
>>> problem is.
>>>
>>> Regards,
>>> Thomas
>>>
>>> 2018-03-27 11:15 GMT+02:00 Pedro Ruivo <pedro at infinispan.org>:
>>>
>>>> JIRA: https://issues.jboss.org/browse/ISPN-8994
>>>>
>>>> On 27-03-2018 10:08, Pedro Ruivo wrote:
>>>> >
>>>> >
>>>> > On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>>>> >> At the moment, the cluster health status checker enumerates all
>>>> caches
>>>> >> in the cache manager [1] and checks whether those cashes are running
>>>> >> and not in degraded more [2].
>>>> >>
>>>> >> I'm not sure how counter caches have been implemented. One thing is
>>>> >> for sure - they should be taken into account in this loop [3].
>>>> >
>>>> > The private caches aren't listed by CacheManager.getCacheNames(). We
>>>> > have to check them via InternalCacheRegistry.getInternalCacheNames().
>>>> >
>>>> > I'll open a JIRA if you don't mind :)
>>>> >
>>>> >>
>>>> >> [1]
>>>> >>
>>>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>>> >>
>>>> >> [2]
>>>> >>
>>>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>>> >>
>>>> >> [3]
>>>> >>
>>>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>
>>>
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20180420/602bf4df/attachment.html 


More information about the infinispan-dev mailing list