On 04 Mar 2014, at 15:02, Guillaume Smet <guillaume.smet(a)gmail.com> wrote:
On Tue, Mar 4, 2014 at 1:36 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
> OK so you want the words hotel + swimming pool to be present somewhere in the sum of
the corpus of title and description. That's the second case I was describing then.
Indeed it kinda fails if you don't order by score but rather alphabetically or by
distance.
> Have you considered the following: your query should only consider the top n, or the
results whose score reaches 70% of the top score and then do your business sort on this
subset.
It doesn't work. Users want the results which really match and we
can't have missing results or additional results.
They search for something and they want to find exactly what they are
looking for.
Note: this is really 99.9% of our use cases, probably because we
mostly develop business applications.
> Anyways, to address this, one need to target fields that are:
> - using the same fieldbridge
> - using the same analyzer
> - do the trick I was describing around filters like ngrams (and then or)
That's when I stopped my work on HSEARCH-917. I wasn't sure I could
decently require such conditions, at least not in the current API.
I started to wonder if we could introduce a text() branch in parallel
to keyword() and phrase() but never really posted about it.
I would like to separate the user responsibility from the developer
responsibility:
- the user defines his search query. It's a little more clever than
just a term search: he can use + - and "": that's why I would like to
use a QueryParser directly (most of our users don't use it but some of
them need it);
- the developer defines how the search is done: it can search on
several fields: for each field, the developer can define a boost (this
is supported by the SimpleQueryParser) AND he can also define if it's
a fuzzy query (not supported out of the box by the SimpleQueryParser).
(we could even imagine to support minimum should match as the dismax
parser does)
Because, this is really what we need on a daily basis: my user don't
really know if his search needs to be fuzzy or not. And I would like
to be able to make the decision for him because I know the corpus of
documents and I know it's going to be needed.
I don't know if it looks like something interesting to you?
Yes a text() branch injecting whatever from the user and letting the developer customise
what needs to be searched makes sense to me.
We can explore that but I am a bit skeptical that it will turn into a true `text()` clause
rather than be a bit more friendly in other branches of the DSL.
I’m happy to be proven wrong. But I am a bit confused and would love some code example to
start from (as in doing what is required).
searchInput