[JBoss JIRA] (ISPN-9762) Cache hangs during rebalancing

Thursday, 22 November 2018

    [
https://issues.jboss.org/browse/ISPN-9762?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-9762:
------------------------------------

Looks like the problem is the {{storesMutex}}, which is a read-write and favors writers
over readers. Once a writer is queued, no other readers are allowed to acquire the lock
until all the current read lock holders finish their work, so a reader cannot rely on
another reader being able to run in parallel.

This is exactly what happens in the RocksDB store, which uses a blocking queue to write
expiration metadata in a separate DB. Insertion threads write to the blocking queue while
holding {{storesMutex.readLock}} and assume that the purge thread can acquire
{{storesMutext.readLock}} in parallel and drain the queue. Once the availability check
thread tries to acquire {{storesMutex.writeLock}}, everything stops.

Luckily, {{storesMutex.writeLock}} is only used during startup, when disabling a store,
and when doing an availability check. Setting {{<persistence
availability-interval="111000">}} should effectively disable the availability
check and work around the issue.

[~ryanemerson] [~william.burns] we need to consider this when it comes to non-blocking
stores as well, maybe we can find an alternative that doesn't block the insertion
threads while doing the availability checks? Ideally I'd like to move the expiration
metadata to the main RocksDB database as well and remove the blocking queue.

...
 Cache hangs during rebalancing
 ------------------------------

                 Key: ISPN-9762
                 URL: https://issues.jboss.org/browse/ISPN-9762
             Project: Infinispan
          Issue Type: Bug
    Affects Versions: 9.4.2.Final
            Reporter: Sergey Chernolyas
            Priority: Blocker
         Attachments: hang_node.txt, normal_node.txt, stat_bad_node.png,
stat_good_node.png

 I have a cluster with two nodes. One node starts without problem. Second node hangs on
rebalancing  cache DEVICES.
 Configuration of the cache:
 {code:xml}
   <distributed-cache name="DEVICES" owners="2"
segments="256"  mode="SYNC">
                     <state-transfer await-initial-transfer="true"
enabled="true" timeout="2400000" chunk-size="2048"/>
                     <partition-handling when-split="ALLOW_READ_WRITES"
merge-policy="PREFERRED_ALWAYS"/>
                     <memory>
                         <object size="300000"
strategy="REMOVE"/>
                     </memory>
                     <rocksdb-store preload="true"
path="/data/rocksdb/devices/data">
                         <expiration
path="/data/rocksdb/devices/expired"/>
                     </rocksdb-store>
                     <indexing index="LOCAL">
                         <property
name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
                         <property
name="default.directory_provider">infinispan</property>
                         <property
name="default.worker.execution">async</property>
                         <property
name="default.index_flush_interval">500</property>
                         <property
name="default.indexwriter.merge_factor">30</property>
                         <property
name="default.indexwriter.merge_max_size">1024</property>
                         <property
name="default.indexwriter.ram_buffer_size">256</property>
                         <property
name="default.locking_cachename">LuceneIndexesLocking_devices</property>
                         <property
name="default.data_cachename">LuceneIndexesData_devices</property>
                         <property
name="default.metadata_cachename">LuceneIndexesMetadata_devices</property>
                     </indexing>
                     <expiration max-idle="172800000"/>
                 </distributed-cache>
 {code}
 The cache contains 70 000 elements. 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009