While setting up the JMS replication for the Lucene index via Hibernate Search i came across an error that complains about *TokenStream contract violation* (see the stack trace at the end of the description).
After a research in the web i found out, that this is usually caused by the update of the TokenStream API that now requires a defined workflow to use it ( https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/analysis/TokenStream.html ).
After investigating all possible problems in my application, i reviewed the source code of the serialization classes in Hibernate Search. So i discovered that you use the TokenStream in your class _org.hibernate.search.indexes.serialization.impl.CopyTokenStream_ but don't comply with the defined workflow. In your method _createAttributeLists_ you need to insert a short line to reset the "input"-TokenStream. Testing my idea i added the method call and all works as expected. Below my code of the createAttributeList method in org.hibernate.search.indexes.serialization.impl.CopyTokenStream:
{code:java} private static List<List<AttributeImpl>> createAttributeLists(TokenStream input) throws IOException { List<List<AttributeImpl>> results = new ArrayList<>(); {color:green} // added input.reset(), see API TokenStream input.reset(); {color} while ( input.incrementToken() ) { List<AttributeImpl> attrs = new ArrayList<>(); results.add( attrs ); Iterator<AttributeImpl> iter = input.getAttributeImplsIterator(); while ( iter.hasNext() ) { //we need to clone as AttributeImpl instances can be reused across incrementToken() calls attrs.add( iter.next().clone() ); } } input.end(); return results; }
{code}
Is there any hidden purpose that you don't use the reset() call or is it just a bug?
*Below the mentioned stack trace:*
org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:109) at org.hibernate.search.backend.jms.impl.JmsBackendQueueTask.run(JmsBackendQueueTask.java:61) at org.hibernate.search.backend.jms.impl.JmsBackendQueueProcessor.applyWork(JmsBackendQueueProcessor.java:88) at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.performOperations(DirectoryBasedIndexManager.java:112) at org.hibernate.search.backend.impl.WorkQueuePerIndexSplitter.commitOperations(WorkQueuePerIndexSplitter.java:49) at org.hibernate.search.backend.impl.BatchedQueueingProcessor.performWorks(BatchedQueueingProcessor.java:81) at org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.flushWorks(PostTransactionWorkQueueSynchronization.java:114) at org.hibernate.search.backend.impl.TransactionalWorker.flushWorks(TransactionalWorker.java:165) at org.hibernate.search.impl.FullTextSessionImpl.flushToIndexes(FullTextSessionImpl.java:87) at com.sobis.jaf.JAFApplication.createIndexFor(JAFApplication.java:919) at com.sobis.jaf.JAFApplication.createIndexAndVerify(JAFApplication.java:820) at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:796) at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:672) at com.sobis.jaf.JAFApplication$1.performAction(JAFApplication.java:486) at com.sobis.jaf.services.thread.JAFThread.run(JAFThread.java:71) Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow. at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111) at org.apache.lucene.analysis.core.KeywordTokenizer.incrementToken(KeywordTokenizer.java:68) at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.createAttributeLists(CopyTokenStream.java:85) at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.buildSerializableTokenStream(CopyTokenStream.java:39) at org.hibernate.search.indexes.serialization.spi.LuceneFieldContext.getTokenStream(LuceneFieldContext.java:137) at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithTokenStreamData(AvroSerializer.java:281) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeField(LuceneWorkSerializerImpl.java:237) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeDocument(LuceneWorkSerializerImpl.java:175) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:97) ... 14 more |
|