On 21 Feb 2014, at 01:15, Sanne Grinovero <sanne(a)hibernate.org> wrote:
I still suspect that a DisMax approach would provide a better
scoring
model but this is an implementation detail we should iterate on at a
second phase.
Essentially taking the example of "albino elephants" I agree on the
behaviour you described but I think there are some additional aspects
to consider when you're evaluating how a partial match "albino" scores
against a full match "albino elephant" in a single field, rather than
split up, or how "albino" could score less in field A rather than
field B, so even swapping positions of termson different fields could
provide a less valuable match.
Probably better explained with an example on a larger data set but
alas I won't be able to craft one soon.. still it's not a blocker at
all as in this first phase I think we should 1) have a working
solution 2) focus on API effectiveness. Performance and a sofisticated
scoring system will necessarily have to follow: I'm unpacking a large
data set to play with, I'm pretty sure we'll have plenty of follow up
improvements.
If I managed to decipher you, you think that applying a dismax query at the top of MLT (ie
the junction between each graph of queries related to each field) would be useful to favor
a field that gets a better score and downplay an average ressemblance over several
fields?
It is much to anticipate that now (that we would need a boolean / dismax work) because it
does impact the context sequence of the DSL at least when the addition is complex.