[infinispan-issues] [JBoss JIRA] (ISPN-8980) High concurrency : Infinispan Directory Provider: Lucene : Error loading metadata for index file

Thu Mar 29 03:46:00 EDT 2018

    [ https://issues.jboss.org/browse/ISPN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553490#comment-13553490 ] 

Debashish Bharali commented on ISPN-8980:
-----------------------------------------

Hi,

Your suggestion of removing the *locking_strategy* property has :
# resolved the issue - when we are using only single node.
# And has minimized the exception drastically when on cluster (2 to 4 nodes).
# We even didn't face any issue at the initial cluster testing too. (Updated in my previous comment)

But on doing some further interesting testing, we have made some observations.
Below are the steps:

# We startup the first node *'N1'* in maintenance mode - with MassIndexer - creating initial indexes.
# Now after all the MassIndexer/EntityLoader threads ends (after 1-2 Hrs). I.e. MassIndexing has been completed. We startup all other 3 nodes *'N2' , 'N3' and 'N4'*. Without MassIndexer.
# Now on moderate to heavy application usage (concurrency), we are again getting the same exception of *Exception occurred java.io.FileNotFoundException: Error loading metadata for index file*.
# But only on the other 3 nodes (N2, N3 and N4). Not on the first node N1.
# On checking the sizes of the Cache stores in all the Nodes, the 3 Nodes (N2,N3 and N4) are having almost equal size, which is 50%-70% of the size of Cache Stores of N1.
# We have repeated these steps multiple times. Even switched MassIndexing node to other 3 nodes too. We have even reduced the number of nodes to 2.
# *But the behaviour is exactly same. I.e. Exception on all the nodes except the initial node doing MassIndexing.*

Please comment.

> High concurrency : Infinispan Directory Provider: Lucene : Error loading metadata for index file
> ------------------------------------------------------------------------------------------------
>
>                 Key: ISPN-8980
>                 URL: https://issues.jboss.org/browse/ISPN-8980
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Lucene Directory
>    Affects Versions: 8.2.5.Final
>            Reporter: Debashish Bharali
>            Assignee: Gustavo Fernandes
>            Priority: Critical
>         Attachments: SysOutLogs.txt, neutrino-hibernate-search-worker-jgroups.xml, neutrino-hibernatesearch-infinispan.xml
>
>
> During high concurrency of action, we are getting *{color:red}'Error loading metadata for index file'{color}* even in *{color:red}Non-Clustered{color}* env.
> *Hibernate Search Indexes (Lucene Indexes) - 5.7.0.Final*
> *Infinispan - 8.2.5.Final*
> *infinispan-directory-provider-8.2.5.Final*
> *jgroups-3.6.7.Final*
> *Worker Backend : JGroups*
> *Worker Execution: Sync*
> *write_metadata_async: false (implicitly)*
> *Note:* Currently we are on Non-Clustered env. We are moving to Clustered Env within few days.
> On analyzing the code, and putting some additional SYSOUT loggers into FileListOperations and DirectoryImplementor classes, we have established the following points:
> # This is happening during high concurrency on non-clustered env.
> #  One thread *'T1'* is deleting a segment and segment name *'SEG1'* from the *'FileListCacheKey'* list* stored in MetaDatacache*.
> # Concurrently, at the same time, another thread *'T2'* is looping through the FileList ['copy list' from MetadataCache - for -FileListCacheKey - provided by toArray method of *FileListOperations* (changes also being done in the corresponding original list by T1 thread) ].
> # *'T2'* is calling open input method on each segment name - getting corresponding Metadata segment from *MetadataCache*.
> # However, for *'T2'*, the *'copy list'* still contains the name of segment *'SEG1'*.
> # So while looping through the list, *'T2'* tries to get Segment from MetadataCache for segment name *'SEG1'*.
> # But at this instant, *segment* corresponding to segment name *'SEG1'*, has been already removed from *MetadataCache* by *'T1'*.
> # This results in *'java.io.FileNotFoundException: Error loading metadata for index file'* for segment name *'SEG1'*
> # As mentioned earlier, this happens more often during high concurrency.
> *{color:red}On a standalone server (non-clustered), we are getting below error intermittently:{color}*
> Full Stack trace:
> 2018-03-19 17:29:11,938  ERROR [Hibernate Search sync consumer thread for index com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer] o.h.s.e.i.LogErrorHandler [LogErrorHandler.java:69]
> 				*{color:red}HSEARCH000058: Exception occurred java.io.FileNotFoundException: Error loading metadata for index file{color}*: M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
> Primary Failure:
> 	Entity com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer  Id 1649990024999813056  Work Type  org.hibernate.search.backend.AddLuceneWork
> java.io.FileNotFoundException: Error loading metadata for index file: M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
> 	at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138) ~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
> 	at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102) ~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
> 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
> 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
> 	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:171) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
> 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:949) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
> 	at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:126) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:92) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.AbstractCommitPolicy.getIndexWriter(AbstractCommitPolicy.java:33) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.SharedIndexCommitPolicy.getIndexWriter(SharedIndexCommitPolicy.java:77) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.SharedIndexWorkspaceImpl.getIndexWriter(SharedIndexWorkspaceImpl.java:36) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:81) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:165) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:151) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
> 	at java.lang.Thread.run(Thread.java:785) [na:1.8.0-internal]
> *As per our understanding, this issue should not come in {color:red}'non-clustered'{color} env. Also it should not arise when worker execution is {color:red}'sync'{color}.*
> *We have debugged the code, and confirmed that the value for {color:red}'write_metadata_async'{color} is coming as 'false' only (as expected).*

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)