[JBoss JIRA] (ISPN-6425) FileNotFoundException with async indexing backend

Tuesday, 19 April 2016

    [
https://issues.jboss.org/browse/ISPN-6425?page=com.atlassian.jira.plugin....
] 

kostd kostd commented on ISPN-6425:
-----------------------------------

[~gustavonalle], we have similar issue in production environment. Environment: wildfly
8.2.0.Final, infinispan 6.0.2.Final, hibernate-search 4.5.1.Final,
hibernate-search-infinispan 4.5.1.Final, two nodes in hibernate-search cluster by jgroups
3.4.5.Final.

we use async data and metadata cache  because it recommended for perf: 

{quote}
 if you need high performance on writes with the Lucene Directory the best option is to
disable any CacheStore; the second best option is to configure the CacheStore as async .
{quote}

{code:title=our infinispan config}
	<global>

		<!-- Duplicate domains are allowed so that multiple deployments with default
configuration of Hibernate Search applications 
			work - if possible it would be better to use JNDI to share the CacheManager across
applications -->
		<globalJmxStatistics enabled="true"
cacheManagerName="HibernateSearch" allowDuplicateDomains="true" />

		<!-- If the transport is omitted, there is no way to create distributed or clustered
caches. There is no added cost to 
			defining a transport but not creating a cache that uses one, since the transport is
created and initialized lazily. -->
		<transport clusterName="${argus.textsearch.infinispan.cluster-name}"
distributedSyncTimeout="240000">

			<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the JGroupsTransport 
				javadocs for more flags -->
			<properties>
				<property name="configurationFile"
value="${jboss.home.dir}/domain/configuration/hibernatesearch-infinispan-jgroups-tcp.xml"
/>
			</properties>
		</transport>

		<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the Infinispan 
			wiki for more JGroups settings:
http://community.jboss.org/wiki/ClusteredConfigurationQuickStart -->

		<!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER,
DONT_REGISTER. Hibernate Search takes care to 
			stop the CacheManager so registering is not needed -->
		<shutdown hookBehavior="DONT_REGISTER" />

	</global>

	<!-- *************************** -->
	<!-- Default "template" settings -->
	<!-- *************************** -->

	<default>

		<locking lockAcquisitionTimeout="20000" writeSkewCheck="false"
concurrencyLevel="500" useLockStriping="false" />

		<invocationBatching enabled="false" />

		<!-- This element specifies that the cache is clustered. modes supported:
distribution (d), replication (r) or invalidation 
			(i). Don't use invalidation to store Lucene indexes (as with Hibernate Search
DirectoryProvider). Replication is recommended 
			for best performance of Lucene indexes, but make sure you have enough memory to store
the index in your heap. Also distribution 
			scales much better than replication on high number of nodes in the cluster. -->
		<clustering mode="replication">

			<!-- Prefer loading all data at startup than later -->
			<stateTransfer timeout="480000" fetchInMemoryState="true" />

			<!-- Network calls are synchronous by default -->
			<sync replTimeout="30000" />
		</clustering>

		<jmxStatistics enabled="true" />

		<eviction maxEntries="-1" strategy="NONE" />

		<expiration maxIdle="-1" />

	</default>

	<!-- *************************************** -->
	<!-- Cache to store Lucene's file metadata -->
	<!-- *************************************** -->
	<namedCache name="LuceneIndexesMetadata">
		<persistence passivation="false">
			<singleFile fetchPersistentState="true"
ignoreModifications="false" preload="true"
purgeOnStartup="false"
				shared="false"
location="${jboss.server.data.dir}/textsearch-store/${argus.db.name}/">
				<async enabled="true" />
			</singleFile>
		</persistence>
	</namedCache>

	<!-- **************************** -->
	<!-- Cache to store Lucene data -->
	<!-- **************************** -->
	<namedCache name="LuceneIndexesData">
		<persistence passivation="false">
			<singleFile fetchPersistentState="true"
ignoreModifications="false" preload="true"
purgeOnStartup="false"
				shared="false"
location="${jboss.server.data.dir}/textsearch-store/${argus.db.name}/">
				<async enabled="true" />
			</singleFile>
		</persistence>
	</namedCache>
{code}

Why  changes in this issue only corrects default value and do nothing with cases, when
async metadata cache was selected explicitly? 
We wanna fast async metadata cache and do not want to regularly catch FileNotFound. Can
we, or should migrate to synchronous metadata(data?) cache?
May be it not possible to correct FileNotFoundException for async cache? Or may be our old
hibernate-search-infinispan-6.0.2.Final.jar not affected to this issue? please help.

...
 FileNotFoundException with async indexing backend
 -------------------------------------------------

                 Key: ISPN-6425
                 URL: https://issues.jboss.org/browse/ISPN-6425
             Project: Infinispan
          Issue Type: Bug
          Components: Embedded Querying, Lucene Directory
    Affects Versions: 8.2.0.Final
            Reporter: Gustavo Fernandes
            Assignee: Gustavo Fernandes
             Fix For: 8.2.1.Final, 9.0.0.Alpha1, 9.0.0.Final

 The Infinispan directory defaults to {{write_metadata_async=true}} when the indexing
backend is configured as async, i.e. {{default.worker.execution}} is {{true}}.
 The {{write_metadata_async=true}} will use {{cache.putAsync}} to write the index file
metadata, while still deleting and creating files syncronously. This can lead to 
 a stale metadata causing FileNotFoundExceptions when executing queries:
 Suppose a lucene directory contains files \[segments_4, _4.si\]. During normal regime,
apart from the user thread, there could be other 2 threads that could be changing the
index, the periodic commit thread (since backend is async) and the async deletion of
files. 
 The following race can happen:
 ||Time||Thread||work type||work||
 |T1|Hibernate Search: Commit Scheduler for index| SYNC | write files segments_5 and _5.si
to the index
 |T2|Hibernate Search: Commit Scheduler for index| ASYNC | write the new file list
containing \[segments_4, _4.si, segments_5,_5.si\] 
 |T3|Hibernate Search: Commit Scheduler for index| ASYNC | enqueue a deletion task for
files segments_4 and _4.si
 |T4|Hibernate Search: async deletion of index| SYNC | dequeue deletion task for files
segments_4 and _4.si
 |T5|Hibernate Search: async deletion of index| SYNC | delete files segments_4 and _4.si
from the index
 |T6|Hibernate Search: async deletion of index| ASYNC | write the new file list containing
\[segments_5,_5.si\]
 |T7|User-thread| |open index reader, file list is \[segments_4, _4.si\], highest segment
number is 4 (file list is not updated yet)
 |T8|User-thread| |open segments_4
 |T9|User-thread| |FileNotFoundException!
 |T10|remote-thread-User| | new file list received \[segments_4, _4.si,
segments_5,_5.si\]
 |T11|remote-thread-User| | new file list received \[segments_5,_5.si\]
 This race can be observed in {{MassIndexerAsyncBackendTest#testMassIndexOnAsync}} that
fails intermittently with the exception:
 {noformat}
 Caused by: java.io.FileNotFoundException: Error loading metadata for index file:
M|segments_4|commonIndex|-1
 	at
org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138)
~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
 	at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102)
~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at
org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at
org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:300)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 	at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
 {noformat}
 We should not enable {{write_metadata_async=true}} for async backends. The file list is
already {{DeltaAware}}, so writing should not pose a meaningfull overhead when done
synchronously. 

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009