[hibernate-dev] HSEARCH: Removing dynamic analyzer mapping?

Tue Jun 30 07:57:28 EDT 2015

Among the many changes of Apache Lucene 5, it is no longer possible to
override the Analyzer on a per-document base.

You have to pick a single Analyzer when opening the IndexWriter.
Of course the Analyzer can still return a different tokenization chain
for each field, but the field->tokenizer mapping has to be consistent
for the lifecycle of the IndexWriter.

This means we might need to drop our "Dynamic Analyzer" feature:
 http://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#_dynamic_analyzer_selection

I did ask to restore the functionality:
https://issues.apache.org/jira/browse/LUCENE-6212

So, the alternatives I'm seeing:
 # Dropping the Dynamic Analyzer feature
 # Cheat and pass in a mutable Analyzer - needs some caution re concurrent usage
 # Cheat and pass in a pre-analyzed Document
 # Fork & patch the IndexWriter

Patching the functionality back in Lucene is trivial, but the Lucene
team needs  to agree on the use case and then the release time will be
long.

We should discuss both a short-term solution and the better long-term solution.

My favourite long-term solution would be to do pre-analysis: in our
master/slave clustering approach, that would have several other
benefits:
 - move the analyzer work to the slaves
 - reduce the network payloads
 - remove the need to be able to serialize analyzers
But I'd prefer to do this in a second "polishing phase" rather than
consider such a backend rewrite as a blocker for Lucene 5.

WDYT?

Thanks,
Sanne