[hibernate-dev] [Search] Elasticsearch - Thoughts about analyzers and fieldbridges

Sanne Grinovero sanne at hibernate.org
Thu Feb 25 12:48:52 EST 2016

On 25 February 2016 at 17:16, Guillaume Smet <guillaume.smet at gmail.com> wrote:
> Hi,
> Running the DSLTest is quite interesting as it runs a lot of different
> queries with different configurations.
> Here are some thoughts while working on it.
> 1/ Analyzers
> =========
> Gunnar, I was wondering what you had in mind for analyzers: a) deal with
> them in Hibernate Search or b) let Elasticsearch do the analyze thingy.
> I'm not sure we can get a/ to work correctly (see below) and for b) we're
> going to need a way to disable the analyzers in Search for the entities
> managed by Elasticsearch.

Right we need to validate and/or push the analyzers configuration to
and then let him do the analysis work.


Davide is exploring this. We were just now discussing in chat how (if?) to
validate the analyzer names which a user is attempting to use, when the named
analyzer is not otherwise defined (on a Lucene backend this would cause an error
during bootstrap as a "model validation failure").

For example, Elasticsearch defines a bunch of default analyzers out of the box,
such as one named "whitespace".
We should be able to validate that this is a valid analyzer name
even though this wasn't explicitly defined using an @AnalyzerDef.

By using the "analysis test" API of ES we should be able to validate names which
have been configured on the ES server by other means too.

One catch is the QueryDSL: AFAIR there are a couple of cases in which we invoke
the analyzer directly to "preprocess" the tokens.. this either needs
to be killed,
or we invoke the ES server to do the same operation but that seems
inefficient.. we'll
see, I don't remember now for which cases we need such things and hopefully
there will be an alternative strategy when it comes to run queries on ES.

As a second step, we should also see if we can "push" new analyzer definitions
to the ES server, taking them from our @AnalyzerDef.. not sure if that's doable.

> Take a phrase query "colder and whitening" extracted from DSLTest. As "and"
> is considered a stopword, the resulting PhraseQuery only contains colder &
> whitening so it's not possible to rebuild the phrase before sending it to
> Elasticsearch. We could try to use span queries but I'm not sure we'll be
> able to get it right with synonyms and so on.
> 2/ FieldBridges
> ===========
> Currently, the conversion from Date to JSON string is managed by a
> fieldbridge specific to the Elasticsearch backend (and it's not that easy
> to plug it in as we have seen it in a ticket).
> I'm wondering if it's a good fit and if we should not invent an entirely
> new thing to deal with the transformation between a given Java type and
> what a backend is waiting for.

I agree, the current abstraction model doesn't cut it.
Other than discussing a "FieldBriedge 2" which is significantly
different and will take quite some work
(and probably require a 6.0 due to API changes) I don't have a
concrete plan for the short term..
you might want to propose something in the scope of current APIs?

Keep in mind I'm working now to remove the Lucene Document notion from
the LuceneWork types;
in this pre-6 phase I think we'll want to have a translation phase
between what the old fieldbridges
are producing and the modernized backends. Essentially I hope we'll be
rewriting it from the backend upwards,
public API as last step.. until we're done there needs to be a
translation processing, and this might live
at a certain level initially and need to be moved while we make progress.

> What made me think of it is that a test from DSLTest is using
> ignoreFieldBridge() on a date field and it disables the Date -> JSON string
> conversion entirely.
> Comments, thoughts?
> --
> Guillaume
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev

More information about the hibernate-dev mailing list