Message Title

Change By:	Steffen Terheiden

While setting up the JMS replication for the Lucene index via Hibernate Search i came across an error that complains about *TokenStream contract violation* (see the stack trace at the end of the description).

After a research in the web i found out, that this is usually caused by the update of the TokenStream API that now requires a defined workflow to use it ( https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/analysis/TokenStream.html ).

After investigating all possible problems in my application, i reviewed the source code of the serialization classes in Hibernate Search. So i discovered that you use the TokenStream in your class _org.hibernate.search.indexes.serialization.impl.CopyTokenStream_ but don't comply with the defined workflow. In your method _createAttributeLists_ you need to insert a short line to reset the "input"-TokenStream.
Testing my idea i added the method call and all works as expected. Below my code of the createAttributeList method in org.hibernate.search.indexes.serialization.impl.CopyTokenStream:

{code:java}
private static List<List<AttributeImpl>> createAttributeLists(TokenStream input) throws IOException {
List<List<AttributeImpl>> results = new ArrayList<>();
{color:green} // added input.reset(), see API TokenStream
input.reset(); {color}
while ( input.incrementToken() ) {
  List<AttributeImpl> attrs = new ArrayList<>();
  results.add( attrs );
  Iterator<AttributeImpl> iter = input.getAttributeImplsIterator();
  while ( iter.hasNext() ) {
   //we need to clone as AttributeImpl instances can be reused across incrementToken() calls
   attrs.add( iter.next().clone() );
  }
}
input.end();
return results;
}

{code}

Is there any hidden purpose that you don't use the reset() call or is it just a bug?

*Below the mentioned stack trace:*

org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork>
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:109)
at org.hibernate.search.backend.jms.impl.JmsBackendQueueTask.run(JmsBackendQueueTask.java:61)
at org.hibernate.search.backend.jms.impl.JmsBackendQueueProcessor.applyWork(JmsBackendQueueProcessor.java:88)
at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.performOperations(DirectoryBasedIndexManager.java:112)
at org.hibernate.search.backend.impl.WorkQueuePerIndexSplitter.commitOperations(WorkQueuePerIndexSplitter.java:49)
at org.hibernate.search.backend.impl.BatchedQueueingProcessor.performWorks(BatchedQueueingProcessor.java:81)
at org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.flushWorks(PostTransactionWorkQueueSynchronization.java:114)
at org.hibernate.search.backend.impl.TransactionalWorker.flushWorks(TransactionalWorker.java:165)
at org.hibernate.search.impl.FullTextSessionImpl.flushToIndexes(FullTextSessionImpl.java:87)
at com.sobis.jaf.JAFApplication.createIndexFor(JAFApplication.java:919)
at com.sobis.jaf.JAFApplication.createIndexAndVerify(JAFApplication.java:820)
at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:796)
at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:672)
at com.sobis.jaf.JAFApplication$1.performAction(JAFApplication.java:486)
at com.sobis.jaf.services.thread.JAFThread.run(JAFThread.java:71)
Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
at org.apache.lucene.analysis.core.KeywordTokenizer.incrementToken(KeywordTokenizer.java:68)
at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.createAttributeLists(CopyTokenStream.java:85)
at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.buildSerializableTokenStream(CopyTokenStream.java:39)
at org.hibernate.search.indexes.serialization.spi.LuceneFieldContext.getTokenStream(LuceneFieldContext.java:137)
at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithTokenStreamData(AvroSerializer.java:281)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeField(LuceneWorkSerializerImpl.java:237)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeDocument(LuceneWorkSerializerImpl.java:175)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:97)
... 14 more

Add Comment

This message was sent by Atlassian JIRA