[infinispan-issues] [JBoss JIRA] (ISPN-6425) FileNotFoundException with async indexing backend
kostd kostd (JIRA)
issues at jboss.org
Thu Apr 21 13:20:00 EDT 2016
[ https://issues.jboss.org/browse/ISPN-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195395#comment-13195395 ]
kostd kostd commented on ISPN-6425:
-----------------------------------
[~gustavonalle], thank u. we already upgraded our stack to WF-10.0.0.Final + ISPN-8.1.0.Final + HS-5.5.1.Final + Lucene-5.3.1, but only in our prospective version, while our production ver stays on old stack, and we _must_ fix this in production.
I learned ISPN-2981 (patch for concurrency issue from thence we applied, got better, but still remains), LUCENE-3422, LUCENE-5541
now will try to avoid multiple merge threads in ConcurrentMergeScheduler, because nothing more clever can`t think. In File.exists magic we cant believe.
I understand that ISPN 6.0.2 is too old and unsupported, but will be grateful for any help. Just in case adduce our last stacks:
{code:title=last stack, after ISPN-2981 concurrency patch applied}
-- before FNFE, we got short-time cluster(network) problems and TopologyOutdatedException. May be merge is not atomic? not, this cant be true.
09:25:43,804 ERROR [InvocationContextInterceptor] (Lucene Merge Thread #72 for index rms/ResourceUnit) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 9
at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:234)
at org.infinispan.statetransfer.StateTransferInterceptor.visitReplaceCommand(StateTransferInterceptor.java:161)
at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:55)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
at org.infinispan.interceptors.CacheMgmtInterceptor.visitReplaceCommand(CacheMgmtInterceptor.java:139)
at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:55)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)
at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:42)
at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:55)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1306)
at org.infinispan.CacheImpl.replaceInternal(CacheImpl.java:966)
at org.infinispan.CacheImpl.replace(CacheImpl.java:957)
at org.infinispan.DecoratedCache.replace(DecoratedCache.java:446)
at org.infinispan.lucene.readlocks.DistributedSegmentReadLocker.acquireReadLock(DistributedSegmentReadLocker.java:117)
at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:156)
at org.infinispan.lucene.impl.DirectoryLuceneV3.openInput(DirectoryLuceneV3.java:130)
at org.apache.lucene.store.Directory.openInput(Directory.java:145)
at org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:65)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:75)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4238)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
-- and after that lovely mass-FNFE, this is only first:
09:26:42,404 ERROR [LogErrorHandler] (Hibernate Search: Index updates queue processor for index rms/ResourceUnit-1) HSEARCH000058: HSEARCH000117: IOException on the IndexWriter: java.io.FileNotFoundException: No sub-file with id .tii found (fileName=_v8.cfs files: [.fnm, .tis, .frq, .fdt, .nrm, .prx, .fdx])
at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
at org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:113)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:83)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:671)
at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:244)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3615)
at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3376)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451)
at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:158)
at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:171)
at org.hibernate.search.backend.impl.lucene.ExclusiveIndexWorkspaceImpl.afterTransactionApplied(ExclusiveIndexWorkspaceImpl.java:45)
at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:124)
at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:67)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
{code:title=older stack, concurrency fix not yet applied}
-- cluster problems again, but merge not interrupted:
14:03:18,314 INFO [JGroupsTransport] (Incoming-2,argusapp1-65137) ISPN000093: Received new, MERGED cluster view: MergeView::[argusapp1-65137|3] (2) [argusapp1-65137, argusapp2-59672], 2 subgroups: [argusapp2-59672|2] (1) [argusapp2-59672], [argusapp1-65137|1] (1) [argusapp1-65137]
...
-- and now:
14:03:28,100 ERROR [LogErrorHandler] (Lucene Merge Thread #234 for index rms/ResourceUnit) HSEARCH000058: HSEARCH000118: Exception during index Merge operation: org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: No sub-file with id .nrm found (fileName=_2n6.cfs files: [.tii, .fnm, .tis, .frq, .fdt, .prx, .fdx])
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:509)
at org.hibernate.search.backend.impl.lucene.overrides.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:58)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.FileNotFoundException: No sub-file with id .nrm found (fileName=_2n6.cfs files: [.tii, .fnm, .tis, .frq, .fdt, .prx, .fdx])
at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
at org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:639)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:121)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4368)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
{code}
> FileNotFoundException with async indexing backend
> -------------------------------------------------
>
> Key: ISPN-6425
> URL: https://issues.jboss.org/browse/ISPN-6425
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying, Lucene Directory
> Affects Versions: 8.2.0.Final
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 8.2.1.Final, 9.0.0.Alpha1, 9.0.0.Final
>
>
> The Infinispan directory defaults to {{write_metadata_async=true}} when the indexing backend is configured as async, i.e. {{default.worker.execution}} is {{true}}.
> The {{write_metadata_async=true}} will use {{cache.putAsync}} to write the index file metadata, while still deleting and creating files syncronously. This can lead to
> a stale metadata causing FileNotFoundExceptions when executing queries:
> Suppose a lucene directory contains files \[segments_4, _4.si\]. During normal regime, apart from the user thread, there could be other 2 threads that could be changing the index, the periodic commit thread (since backend is async) and the async deletion of files.
> The following race can happen:
> ||Time||Thread||work type||work||
> |T1|Hibernate Search: Commit Scheduler for index| SYNC | write files segments_5 and _5.si to the index
> |T2|Hibernate Search: Commit Scheduler for index| ASYNC | write the new file list containing \[segments_4, _4.si, segments_5,_5.si\]
> |T3|Hibernate Search: Commit Scheduler for index| ASYNC | enqueue a deletion task for files segments_4 and _4.si
> |T4|Hibernate Search: async deletion of index| SYNC | dequeue deletion task for files segments_4 and _4.si
> |T5|Hibernate Search: async deletion of index| SYNC | delete files segments_4 and _4.si from the index
> |T6|Hibernate Search: async deletion of index| ASYNC | write the new file list containing \[segments_5,_5.si\]
> |T7|User-thread| |open index reader, file list is \[segments_4, _4.si\], highest segment number is 4 (file list is not updated yet)
> |T8|User-thread| |open segments_4
> |T9|User-thread| |FileNotFoundException!
> |T10|remote-thread-User| | new file list received \[segments_4, _4.si, segments_5,_5.si\]
> |T11|remote-thread-User| | new file list received \[segments_5,_5.si\]
> This race can be observed in {{MassIndexerAsyncBackendTest#testMassIndexOnAsync}} that fails intermittently with the exception:
> {noformat}
> Caused by: java.io.FileNotFoundException: Error loading metadata for index file: M|segments_4|commonIndex|-1
> at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138) ~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
> at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102) ~[infinispan-lucene-directory-9.0.0-SNAPSHOT.jar:9.0.0-SNAPSHOT]
> at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:300) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251) ~[lucene-core-5.5.0.jar:5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:18:34]
> {noformat}
> We should not enable {{write_metadata_async=true}} for async backends. The file list is already {{DeltaAware}}, so writing should not pose a meaningfull overhead when done synchronously.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
More information about the infinispan-issues
mailing list