[hibernate-dev] [SEARCH] Query-only analyzers with Elasticsearch - new annotation?

Yoann Rodiere yoann at hibernate.org
Wed Jan 4 11:00:37 EST 2017


Hello team,

I'm currently working on HSEARCH-2534, "Query-only analyzer definitions are
never added to the index settings with Elasticsearch".
This issue is about using analyzers only when querying with Elasticsearch.
It is already possible with Lucene, but not in Elasticsearch, because we
assume that any analyzer definition that is not referenced by a @Analyzer
annotation is a Lucene analyzer [1].

To be precise, the exact place where query-only analyzers are used is in
EntityContext.overridesForField [2], and the overrides are leveraged even
with Elasticsearch, for instance in ConnectedMultiFieldsTermQueryBuilder
[3].

I can see two solutions to the issue:

   1. Make all analyzer definitions available for all indexing services.
   2. Allow users to define, for each entity, which analyzer definitions
   will be necessary when querying, even though the definitions are not used
   when indexing.

Solution 1 seems quite hard to implement correctly.
First we'd have to have a different namespace for each indexing service,
but I've already implemented that much.
Second, some analyzer definitions are only valid for one indexing service,
and not for the other.
For instance, analyzer definitions using ElasticsearchTokenFilterFactory
are specific to Elasticsearch. And Analyzer definitions using
the WhitespaceTokenizerFactory with the "rule" parameter are only valid
with embedded Lucene. And so on. To sum up, I'm not sure we can do
something smart.

Solution 2 is easier to implement, but requires to add a bit of API: the
way for users to declare that a given analyzer definition is to be
available when querying a given entity. I would add type-level
@QueryAnalyzer(definition = "foo") and @QueryAnalyzers annotation.

I know nobody wants to add new annotations in a minor, but right now that
seems to be the only workable solution.

What do you think?

[1]
https://github.com/hibernate/hibernate-search/blob/1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/java/org/hibernate/search/engine/impl/ConfigContext.java#L277
[2]
https://github.com/hibernate/hibernate-search/blob/1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/java/org/hibernate/search/query/dsl/EntityContext.java#L14
[3]
https://github.com/hibernate/hibernate-search/blob/1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/java/org/hibernate/search/query/dsl/impl/ConnectedMultiFieldsTermQueryBuilder.java#L222


Yoann Rodière <yoann at hibernate.org>
Hibernate NoORM Team


More information about the hibernate-dev mailing list