[infinispan-issues] [JBoss JIRA] (ISPN-11033) Cluster fails while inserting data for a while

Wed Dec 4 09:47:00 EST 2019

Jens Reimann created ISPN-11033:
-----------------------------------

             Summary: Cluster fails while inserting data for a while
                 Key: ISPN-11033
                 URL: https://issues.jboss.org/browse/ISPN-11033
             Project: Infinispan
          Issue Type: Bug
          Components: Server
    Affects Versions: 10.0.1.Final
         Environment: 12 node Infinispan cluster, OpenShift 4.2
            Reporter: Jens Reimann
         Attachments: deviceManagement.proto, infinispan.xml

Inserting data into an Infinispan cluster works for a while, and then the cluster fails. Showing the following log messages in one pod:

{code}
14:20:34,432 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command ReplaceCommand on Cache 'devices', writing keys [WrappedByteArray{bytes=8201\*\i\o\.\e\n\m\a\s\s\e\.\i\o\t\.\i\n\f\i\n\i\s\p\a\n\.\d\e\v\i\c\e\.\D\e\v\i\c\e\K\e\y8A01\<0A1F\j\b\t\e\s\t\.\i\o\t\/\2\0\1\9\-\1\2\-\0\4\T\0\8\:\2\5\:\3\4\Z1219\h\t\t\p\-\i\n\s\e\r\t\e\r\-\f\r\8\l\m\1\5\2\2\4\7, hashCode=-381217399}]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key WrappedByteArray{bytes=8201\*\i\o\.\e\n\m\a\s\s\e\.\i\o\t\.\i\n\f\i\n\i\s\p\a\n\.\d\e\v\i\c\e\.\D\e\v\i\c\e\K\e\y8A01\<0A1F\j\b\t\e\s\t\.\i\o\t\/\2\0\1\9\-\1\2\-\0\4\T\0\8\:\2\5\:\3\4\Z1219\h\t\t\p\-\i\n\s\e\r\t\e\r\-\f\r\8\l\m\1\5\2\2\4\7, hashCode=-381217399} and requestor GlobalTx:infinispan-8-8720:1383960. Lock is held by GlobalTx:infinispan-8-8720:33804
	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:292)
	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:222)
	at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:440)
	at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:416)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
	at org.infinispan.commons.util.concurrent.CallerRunsRejectOnShutdownPolicy.rejectedExecution(CallerRunsRejectOnShutdownPolicy.java:19)
	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
	at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
	at org.infinispan.executors.LazyInitializingExecutorService.execute(LazyInitializingExecutorService.java:138)
	at java.base/java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:568)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:638)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.notifyListeners(InfinispanLock.java:527)
	at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.cancel(InfinispanLock.java:382)
	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.call(DefaultLockManager.java:286)
	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.call(DefaultLockManager.java:222)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

{code}

While showing the following message in the other nodes log:

{code}
14:44:26,310 ERROR [org.jgroups.protocols.TCP] (jgroups-133,infinispan-3-50867) JGRP000034: infinispan-3-50867: failure sending message to infinispan-8-17029: java.net.SocketTimeoutException: connect timed out
14:44:28,611 ERROR [org.jgroups.protocols.TCP] (jgroups-133,infinispan-3-50867) JGRP000034: infinispan-3-50867: failure sending message to infinispan-8-17029: java.net.SocketTimeoutException: connect timed out
14:44:30,912 ERROR [org.jgroups.protocols.TCP] (jgroups-126,infinispan-3-50867) JGRP000034: infinispan-3-50867: failure sending message to infinispan-8-17029: java.net.SocketTimeoutException: connect timed out
{code}

The node showing the exception gets killed after a while by Kubernetes:

{code}
NAME            READY   STATUS                 RESTARTS   AGE
infinispan-0    1/1     Running                0          83m
infinispan-1    1/1     Running                0          83m
infinispan-10   1/1     Running                0          83m
infinispan-11   1/1     Running                0          83m
infinispan-2    1/1     Running                0          83m
infinispan-3    1/1     Running                0          83m
infinispan-4    1/1     Running                0          83m
infinispan-5    1/1     Running                0          83m
infinispan-6    1/1     Running                0          83m
infinispan-7    1/1     Running                0          83m
infinispan-8    0/1     CreateContainerError   3          83m
infinispan-9    1/1     Running                0          83m
{code}

But it never becomes ready again.

--
This message was sent by Atlassian Jira
(v7.13.8#713008)