[JBoss JIRA] (ISPN-8980) High concurrency : Infinispan Directory Provider: Lucene : Error loading metadata for index file

Thursday, 29 March 2018

    [
https://issues.jboss.org/browse/ISPN-8980?page=com.atlassian.jira.plugin....
] 

Debashish Bharali edited comment on ISPN-8980 at 3/29/18 4:22 AM:
------------------------------------------------------------------

We aren't using "LuceneIndexesLocking" as replicated-cache. This is still
Local-Cache.
Below are the rationals for setting this as local rather than replicated:
# We are using *JGroups as Worker-Backend*. (I.e. Master selection being automatic)
# With this, when the second node *'N2'* joins in cluster, the lock remains with
*'N1'* and node *'N2'* becomes the master. *(logic in
AutoNodeSelector.class)*
# This results in *{color:red}'Unable to acquire Lock Exception'{color}* on
*'N2'* node. And all our operations fail.
# Another issue being, when we are using cache store, and the current Master Node
'N1' shuts down, it results in *'Orphan Locks'*. Now starting up any node
(even the earlier master) results in *'Unable to acquire Lock Exception'*. And all
operation fails.
# As suggested in one of the forums, the workaround for this scenario with 'JGroups
worker backend' is to make the *'LuceneIndexesLocking' as local*. This is a
logical and partially solution for us.
# Without this, we weren't being able to even start with HibernateSearch Lucene
indexes clustering.

was (Author: debashish.bharali):
We aren't using "LuceneIndexesLocking" as replicated-cache. This is still
Local-Cache.
Below are the rationals for setting this as local rather than replicated:
# We are using *JGroups as Worker-Backend*. (I.e. Master selection being automatic)
# With this, when the second node *'N2'* joins in cluster, the lock remains with
*'N1'* and node *'N2'* becomes the master.
# This results in *{color:red}'Unable to acquire Lock Exception'{color}* on
*'N2'* node. And all our operations fail.
# Another issue being, when we are using cache store, and the current Master Node
'N1' shuts down, it results in *'Orphan Locks'*. Now starting up any node
(even the earlier master) results in *'Unable to acquire Lock Exception'*. And all
operation fails.
# As suggested in one of the forums, the workaround for this scenario with 'JGroups
worker backend' is to make the *'LuceneIndexesLocking' as local*. This is a
logical and partially solution for us.
# Without this, we weren't being able to even start with HibernateSearch Lucene
indexes clustering.

...
 High concurrency : Infinispan Directory Provider: Lucene : Error
loading metadata for index file

------------------------------------------------------------------------------------------------

                 Key: ISPN-8980
                 URL: https://issues.jboss.org/browse/ISPN-8980
             Project: Infinispan
          Issue Type: Bug
          Components: Lucene Directory
    Affects Versions: 8.2.5.Final
            Reporter: Debashish Bharali
            Assignee: Gustavo Fernandes
            Priority: Critical
         Attachments: SysOutLogs.txt, neutrino-hibernate-search-worker-jgroups.xml,
neutrino-hibernatesearch-infinispan.xml

 During high concurrency of action, we are getting *{color:red}'Error loading metadata
for index file'{color}* even in *{color:red}Non-Clustered{color}* env.
 *Hibernate Search Indexes (Lucene Indexes) - 5.7.0.Final*
 *Infinispan - 8.2.5.Final*
 *infinispan-directory-provider-8.2.5.Final*
 *jgroups-3.6.7.Final*
 *Worker Backend : JGroups*
 *Worker Execution: Sync*
 *write_metadata_async: false (implicitly)*
 *Note:* Currently we are on Non-Clustered env. We are moving to Clustered Env within few
days.
 On analyzing the code, and putting some additional SYSOUT loggers into FileListOperations
and DirectoryImplementor classes, we have established the following points:
 # This is happening during high concurrency on non-clustered env.
 #  One thread *'T1'* is deleting a segment and segment name *'SEG1'* from
the *'FileListCacheKey'* list* stored in MetaDatacache*.
 # Concurrently, at the same time, another thread *'T2'* is looping through the
FileList ['copy list' from MetadataCache - for -FileListCacheKey - provided by
toArray method of *FileListOperations* (changes also being done in the corresponding
original list by T1 thread) ].
 # *'T2'* is calling open input method on each segment name - getting
corresponding Metadata segment from *MetadataCache*.
 # However, for *'T2'*, the *'copy list'* still contains the name of
segment *'SEG1'*.
 # So while looping through the list, *'T2'* tries to get Segment from
MetadataCache for segment name *'SEG1'*.
 # But at this instant, *segment* corresponding to segment name *'SEG1'*, has been
already removed from *MetadataCache* by *'T1'*.
 # This results in *'java.io.FileNotFoundException: Error loading metadata for index
file'* for segment name *'SEG1'*
 # As mentioned earlier, this happens more often during high concurrency.
 *{color:red}On a standalone server (non-clustered), we are getting below error
intermittently:{color}*
 Full Stack trace:
 2018-03-19 17:29:11,938  ERROR [Hibernate Search sync consumer thread for index
com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer]
o.h.s.e.i.LogErrorHandler [LogErrorHandler.java:69]
 				*{color:red}HSEARCH000058: Exception occurred java.io.FileNotFoundException: Error
loading metadata for index file{color}*:
M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
 Primary Failure:
 	Entity com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer  Id
1649990024999813056  Work Type  org.hibernate.search.backend.AddLuceneWork
 java.io.FileNotFoundException: Error loading metadata for index file:
M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
 	at
org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138)
~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
 	at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102)
~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz -
2017-02-08 19:08:03]
 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294)
~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz -
2017-02-08 19:08:03]
 	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:171)
~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz -
2017-02-08 19:08:03]
 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:949)
~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz -
2017-02-08 19:08:03]
 	at
org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:126)
~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:92)
~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.AbstractCommitPolicy.getIndexWriter(AbstractCommitPolicy.java:33)
~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.SharedIndexCommitPolicy.getIndexWriter(SharedIndexCommitPolicy.java:77)
~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.SharedIndexWorkspaceImpl.getIndexWriter(SharedIndexWorkspaceImpl.java:36)
~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203)
[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:81)
[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46)
[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:165)
[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at
org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:151)
[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
 	at java.lang.Thread.run(Thread.java:785) [na:1.8.0-internal]
 *As per our understanding, this issue should not come in
{color:red}'non-clustered'{color} env. Also it should not arise when worker
execution is {color:red}'sync'{color}.*
 *We have debugged the code, and confirmed that the value for
{color:red}'write_metadata_async'{color} is coming as 'false' only (as
expected).* 

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009