[infinispan-issues] [JBoss JIRA] (ISPN-9044) In Cluster - Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node

Thu Apr 5 10:47:00 EDT 2018

Debashish Bharali created ISPN-9044:
---------------------------------------

             Summary: In Cluster - Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node
                 Key: ISPN-9044
                 URL: https://issues.jboss.org/browse/ISPN-9044
             Project: Infinispan
          Issue Type: Bug
          Components: Lucene Directory
    Affects Versions: 8.2.5.Final
            Reporter: Debashish Bharali
            Priority: Critical
         Attachments: neutrino-hibernatesearch-infinispan.xml

Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node.

Related to ISPN-8980 (https://issues.jboss.org/browse/ISPN-8980).
We are using Hibernate Search Indexes - Lucene indexes being stored on Infinispan with SingleFileStore.

In case of more than 1 node. For example 4 nodes. We are observing below behaviour.
Below are the steps:

# We startup the first node *'N1'* in maintenance mode - with MassIndexer - creating initial indexes.
# Now after all the MassIndexer/EntityLoader threads ends (after 1-2 Hrs). I.e. MassIndexing has been completed. We startup all other 3 nodes *'N2' , 'N3' and 'N4'*. Without MassIndexer.
# Now on moderate to heavy application usage (concurrency), we are again getting the same exception of *Exception occurred java.io.FileNotFoundException: Error loading metadata for index file. Which indicates, {color:red}Some entries are not present in cache.{color}*
# *But this exception comes only on the other 3 nodes (N2, N3 and N4). Not on the first node N1.*
# On checking the sizes of the Cache stores in all the Nodes, the 3 Nodes (N2,N3 and N4) are having almost equal size (600 MB), which is 50%-70% of the size of Cache Stores of N1 (1.2 GB).
# We have repeated these steps multiple times. Even switched MassIndexing node to other 3 nodes too. We have even reduced the number of nodes to 2.
# *But the behaviour is exactly same. I.e. Exception on all the nodes except the initial node doing MassIndexing.*
# {color:red} It seems like, *'N1's* cache-store's persistent state is not getting fetched by *'N2' 'N3' and 'N4'*, when these node joins joins.{color}
# This is indicated by the fact that, FileNotFoundException doesn't comes in 'N1'. It comes in other nodes only (who joined later -- like N2, N3 & N4). And size of cache store's *'.DAT'* files are smaller then *'N1's*.

Require urgent support.
Attaching the corresponding Infinispan config file (neutrino-hibernatesearch-infinispan.xml)

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)