While setting up the JMS replication for the Lucene index via Hibernate Search i came across an error that complains about TokenStream contract violation (see the stack trace at the end of the description). After a research in the web i found out, that this is usually caused by the update of the TokenStream API that now requires a defined workflow to use it ( https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/analysis/TokenStream.html ). After investigating all possible problems in my application, i reviewed the source code of the serialization classes in Hibernate Search. So i discovered that you use the TokenStream in your class org.hibernate.search.indexes.serialization.impl.CopyTokenStream but don't comply with the defined workflow. In your method createAttributeLists you need to insert a short line to reset the "input"-TokenStream. Testing my idea i added the method call and all works as expected. Below my code of the createAttributeList method in org.hibernate.search.indexes.serialization.impl.CopyTokenStream:
private static List<List<AttributeImpl>> createAttributeLists(TokenStream input) throws IOException {
List<List<AttributeImpl>> results = new ArrayList<>();
{color:green} input.reset();{color}
while ( input.incrementToken() ) {
List<AttributeImpl> attrs = new ArrayList<>();
results.add( attrs );
Iterator<AttributeImpl> iter = input.getAttributeImplsIterator();
while ( iter.hasNext() ) {
attrs.add( iter.next().clone() );
}
}
input.end();
return results;
}
Is there any hidden purpose that you don't use the reset() call or is it just a bug? Below the mentioned stack trace: org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:109) at org.hibernate.search.backend.jms.impl.JmsBackendQueueTask.run(JmsBackendQueueTask.java:61) at org.hibernate.search.backend.jms.impl.JmsBackendQueueProcessor.applyWork(JmsBackendQueueProcessor.java:88) at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.performOperations(DirectoryBasedIndexManager.java:112) at org.hibernate.search.backend.impl.WorkQueuePerIndexSplitter.commitOperations(WorkQueuePerIndexSplitter.java:49) at org.hibernate.search.backend.impl.BatchedQueueingProcessor.performWorks(BatchedQueueingProcessor.java:81) at org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.flushWorks(PostTransactionWorkQueueSynchronization.java:114) at org.hibernate.search.backend.impl.TransactionalWorker.flushWorks(TransactionalWorker.java:165) at org.hibernate.search.impl.FullTextSessionImpl.flushToIndexes(FullTextSessionImpl.java:87) at com.sobis.jaf.JAFApplication.createIndexFor(JAFApplication.java:919) at com.sobis.jaf.JAFApplication.createIndexAndVerify(JAFApplication.java:820) at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:796) at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:672) at com.sobis.jaf.JAFApplication$1.performAction(JAFApplication.java:486) at com.sobis.jaf.services.thread.JAFThread.run(JAFThread.java:71) Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow. at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111) at org.apache.lucene.analysis.core.KeywordTokenizer.incrementToken(KeywordTokenizer.java:68) at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.createAttributeLists(CopyTokenStream.java:85) at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.buildSerializableTokenStream(CopyTokenStream.java:39) at org.hibernate.search.indexes.serialization.spi.LuceneFieldContext.getTokenStream(LuceneFieldContext.java:137) at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithTokenStreamData(AvroSerializer.java:281) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeField(LuceneWorkSerializerImpl.java:237) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeDocument(LuceneWorkSerializerImpl.java:175) at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:97) ... 14 more |