[
https://issues.jboss.org/browse/ISPN-6425?page=com.atlassian.jira.plugin....
]
kostd kostd commented on ISPN-6425:
-----------------------------------
[~gustavonalle], we have similar issue in production environment. Environment: wildfly
8.2.0.Final, infinispan 6.0.2.Final, hibernate-search 4.5.1.Final,
hibernate-search-infinispan 4.5.1.Final, two nodes in hibernate-search cluster by jgroups
3.4.5.Final.
we use async data and metadata cache because it recommended for perf:
{quote}
if you need high performance on writes with the Lucene Directory the best option is to
disable any CacheStore; the second best option is to configure the CacheStore as async .
{quote}
{code:title=our infinispan config}
<global>
<!-- Duplicate domains are allowed so that multiple deployments with default
configuration of Hibernate Search applications
work - if possible it would be better to use JNDI to share the CacheManager across
applications -->
<globalJmxStatistics enabled="true"
cacheManagerName="HibernateSearch" allowDuplicateDomains="true" />
<!-- If the transport is omitted, there is no way to create distributed or clustered
caches. There is no added cost to
defining a transport but not creating a cache that uses one, since the transport is
created and initialized lazily. -->
<transport clusterName="${argus.textsearch.infinispan.cluster-name}"
distributedSyncTimeout="240000">
<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the JGroupsTransport
javadocs for more flags -->
<properties>
<property name="configurationFile"
value="${jboss.home.dir}/domain/configuration/hibernatesearch-infinispan-jgroups-tcp.xml"
/>
</properties>
</transport>
<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the Infinispan
wiki for more JGroups settings:
http://community.jboss.org/wiki/ClusteredConfigurationQuickStart -->
<!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER,
DONT_REGISTER. Hibernate Search takes care to
stop the CacheManager so registering is not needed -->
<shutdown hookBehavior="DONT_REGISTER" />
</global>
<!-- *************************** -->
<!-- Default "template" settings -->
<!-- *************************** -->
<default>
<locking lockAcquisitionTimeout="20000" writeSkewCheck="false"
concurrencyLevel="500" useLockStriping="false" />
<invocationBatching enabled="false" />
<!-- This element specifies that the cache is clustered. modes supported:
distribution (d), replication (r) or invalidation
(i). Don't use invalidation to store Lucene indexes (as with Hibernate Search
DirectoryProvider). Replication is recommended
for best performance of Lucene indexes, but make sure you have enough memory to store
the index in your heap. Also distribution
scales much better than replication on high number of nodes in the cluster. -->
<clustering mode="replication">
<!-- Prefer loading all data at startup than later -->
<stateTransfer timeout="480000" fetchInMemoryState="true" />
<!-- Network calls are synchronous by default -->
<sync replTimeout="30000" />
</clustering>
<jmxStatistics enabled="true" />
<eviction maxEntries="-1" strategy="NONE" />
<expiration maxIdle="-1" />
</default>
<!-- *************************************** -->
<!-- Cache to store Lucene's file metadata -->
<!-- *************************************** -->
<namedCache name="LuceneIndexesMetadata">
<persistence passivation="false">
<singleFile fetchPersistentState="true"
ignoreModifications="false" preload="true"
purgeOnStartup="false"
shared="false"
location="${jboss.server.data.dir}/textsearch-store/${argus.db.name}/">
<async enabled="true" />
</singleFile>
</persistence>
</namedCache>
<!-- **************************** -->
<!-- Cache to store Lucene data -->
<!-- **************************** -->
<namedCache name="LuceneIndexesData">
<persistence passivation="false">
<singleFile fetchPersistentState="true"
ignoreModifications="false" preload="true"
purgeOnStartup="false"
shared="false"
location="${jboss.server.data.dir}/textsearch-store/${argus.db.name}/">
<async enabled="true" />
</singleFile>
</persistence>
</namedCache>
{code}
Why changes in this issue only corrects default value and do nothing with cases, when
async metadata cache was selected explicitly?
We wanna fast async metadata cache and do not want to regularly catch FileNotFound. Can
we, or should migrate to synchronous metadata(data?) cache?
May be it not possible to correct FileNotFoundException for async cache? Or may be our old
hibernate-search-infinispan-6.0.2.Final.jar not affected to this issue? please help.
FileNotFoundException with async indexing backend
-------------------------------------------------
Key: ISPN-6425
URL:
https://issues.jboss.org/browse/ISPN-6425
Project: Infinispan
Issue Type: Bug
Components: Embedded Querying, Lucene Directory
Affects Versions: 8.2.0.Final
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
Fix For: 8.2.1.Final, 9.0.0.Alpha1, 9.0.0.Final
The Infinispan directory defaults to {{write_metadata_async=true}} when the indexing
backend is configured as async, i.e. {{default.worker.execution}} is {{true}}.
The {{write_metadata_async=true}} will use {{cache.putAsync}} to write the index file
metadata, while still deleting and creating files syncronously. This can lead to
a stale metadata causing FileNotFoundExceptions when executing queries:
Suppose a lucene directory contains files \[segments_4, _4.si\]. During normal regime,
apart from the user thread, there could be other 2 threads that could be changing the
index, the periodic commit thread (since backend is async) and the async deletion of
files.
The following race can happen:
||Time||Thread||work type||work||
|T1|Hibernate Search: Commit Scheduler for index| SYNC | write files segments_5 and _5.si
to the index
|T2|Hibernate Search: Commit Scheduler for index| ASYNC | write the new file list
containing \[segments_4, _4.si, segments_5,_5.si\]
|T3|Hibernate Search: Commit Scheduler for index| ASYNC | enqueue a deletion task for
files segments_4 and _4.si
|T4|Hibernate Search: async deletion of index| SYNC | dequeue deletion task for files
segments_4 and _4.si
|T5|Hibernate Search: async deletion of index| SYNC | delete files segments_4 and _4.si
from the index
|T6|Hibernate Search: async deletion of index| ASYNC | write the new file list containing
\[segments_5,_5.si\]
|T7|User-thread| |open index reader, file list is \[segments_4, _4.si\], highest segment
number is 4 (file list is not updated yet)
|T8|User-thread| |open segments_4
|T9|User-thread| |FileNotFoundException!
|T10|remote-thread-User| | new file list received \[segments_4, _4.si,
segments_5,_5.si\]
|T11|remote-thread-User| | new file list received \[segments_5,_5.si\]
This race can be observed in {{MassIndexerAsyncBackendTest#testMassIndexOnAsync}} that
fails intermittently with the exception:
{noformat}
Caused by: java.io.FileNotFoundException: Error loading metadata for index file:
M|segments_4|commonIndex|-1
at
org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138)
~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102)
~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at
org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at
org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:300)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251)
~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16
15:18:34]
{noformat}
We should not enable {{write_metadata_async=true}} for async backends. The file list is
already {{DeltaAware}}, so writing should not pose a meaningfull overhead when done
synchronously.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)