On 25 February 2016 at 17:16, Guillaume Smet <guillaume.smet(a)gmail.com> wrote:
Hi,
Running the DSLTest is quite interesting as it runs a lot of different
queries with different configurations.
Here are some thoughts while working on it.
1/ Analyzers
=========
Gunnar, I was wondering what you had in mind for analyzers: a) deal with
them in Hibernate Search or b) let Elasticsearch do the analyze thingy.
I'm not sure we can get a/ to work correctly (see below) and for b) we're
going to need a way to disable the analyzers in Search for the entities
managed by Elasticsearch.
Right we need to validate and/or push the analyzers configuration to
Elasticsearch,
and then let him do the analysis work.
https://hibernate.atlassian.net/browse/HSEARCH-2108
Davide is exploring this. We were just now discussing in chat how (if?) to
validate the analyzer names which a user is attempting to use, when the named
analyzer is not otherwise defined (on a Lucene backend this would cause an error
during bootstrap as a "model validation failure").
For example, Elasticsearch defines a bunch of default analyzers out of the box,
such as one named "whitespace".
We should be able to validate that this is a valid analyzer name
even though this wasn't explicitly defined using an @AnalyzerDef.
By using the "analysis test" API of ES we should be able to validate names
which
have been configured on the ES server by other means too.
One catch is the QueryDSL: AFAIR there are a couple of cases in which we invoke
the analyzer directly to "preprocess" the tokens.. this either needs
to be killed,
or we invoke the ES server to do the same operation but that seems
inefficient.. we'll
see, I don't remember now for which cases we need such things and hopefully
there will be an alternative strategy when it comes to run queries on ES.
As a second step, we should also see if we can "push" new analyzer definitions
to the ES server, taking them from our @AnalyzerDef.. not sure if that's doable.
Take a phrase query "colder and whitening" extracted from
DSLTest. As "and"
is considered a stopword, the resulting PhraseQuery only contains colder &
whitening so it's not possible to rebuild the phrase before sending it to
Elasticsearch. We could try to use span queries but I'm not sure we'll be
able to get it right with synonyms and so on.
2/ FieldBridges
===========
Currently, the conversion from Date to JSON string is managed by a
fieldbridge specific to the Elasticsearch backend (and it's not that easy
to plug it in as we have seen it in a ticket).
I'm wondering if it's a good fit and if we should not invent an entirely
new thing to deal with the transformation between a given Java type and
what a backend is waiting for.
I agree, the current abstraction model doesn't cut it.
Other than discussing a "FieldBriedge 2" which is significantly
different and will take quite some work
(and probably require a 6.0 due to API changes) I don't have a
concrete plan for the short term..
you might want to propose something in the scope of current APIs?
Keep in mind I'm working now to remove the Lucene Document notion from
the LuceneWork types;
in this pre-6 phase I think we'll want to have a translation phase
between what the old fieldbridges
are producing and the modernized backends. Essentially I hope we'll be
rewriting it from the backend upwards,
public API as last step.. until we're done there needs to be a
translation processing, and this might live
at a certain level initially and need to be moved while we make progress.
What made me think of it is that a test from DSLTest is using
ignoreFieldBridge() on a date field and it disables the Date -> JSON string
conversion entirely.
Comments, thoughts?
--
Guillaume
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev