[infinispan-issues] [JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load

Monday, 2 December 2019

     [
https://issues.jboss.org/browse/ISPN-11017?page=com.atlassian.jira.plugin...
]

Jens Reimann updated ISPN-11017:
--------------------------------
    Attachment: infinispan.xml

...
 Cluster fails and doesn't recover under load
 --------------------------------------------

                 Key: ISPN-11017
                 URL: https://issues.jboss.org/browse/ISPN-11017
             Project: Infinispan
          Issue Type: Bug
          Components: Server
    Affects Versions: 10.0.1.Final
         Environment: Running in OpenShift, with a stateful set of 12 nodes, a distributed
cache with 3 owners, async indexing enabled, persistence with rocksdb.
            Reporter: Jens Reimann
            Priority: Blocker
         Attachments: infinispan.xml

 After running the load test for a few seconds the inifinispan cluster stops accepting
requests and the nodes start to split off from the cluster. In the server's log you
can find tons of exceptions like:
 {code:java}
 10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor]
(timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache
'___protobuf_metadata', writing keys [deviceRegistry.proto]:
org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after
10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249.
Lock is held by GlobalTx:infinispan-2-61958:248
 {code}
 Stopping the load test doesn't let the cluster recover. Most (not all) of the
liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still
in a non-working state. 

--
This message was sent by Atlassian Jira
(v7.13.8#713008)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load