[hibernate-dev] Search: Dynamic Document boosting

Sanne Grinovero sanne.grinovero at gmail.com
Thu Apr 30 08:11:15 EDT 2009


2009/4/30 Emmanuel Bernard <emmanuel at hibernate.org>:
>
> On  Apr 30, 2009, at 13:01, Sanne Grinovero wrote:
>
>> Basically I need a function to convert a user-proposed term to a
>> series of proposals
>> of "similar" terms but giving a higher rank to the terms I'd prefer
>> him to choose as they
>> are the correct names used in my domain.
>> You can think of it as a spellchecker/dictionary (using synonyms
>> toos), but giving priority to
>> a selected form of each term, known as the "root", or the standard
>> form in the domain.
>
> Have you considered indexing the same property twice:
>  - once unaltered and giving it a high boost
>  - once with synonyms and giving it a low boost
>
> That's how I would sove your use case personally.
>

That would be clever if I had only two levels of boost, but it's not the case.
Also most properties are being indexed already in 5+n different fields,
being n the number of supported languages for snowball (currently 15,
so the next step
will be to try the programmatic configuration to remove all this annotations).
So adding more fields will drammatically increase the number of them
(number of boost
levels * (5 + number of enabled stemmers))
and make the index size unnecessarily large, and I'm not solving the
time-fading requirement.
My BI suite is returning some nice float which I'm storing in the
entity itself as a property,
it would make my life a lot easier if I could just use this float
value as the document boost.


>>
>>
>> This is the simple explanation which I believe is useful in all indexing
>> modes;
>> actually I'm also going to use it for the 2 reasons you are
>> listing, and it's not a problem for me as I rebuild the index
>> regularly and prefer for
>> several reasons to pre-compute the boosts needed for "categories in
>> marketing mood"
>> and "fade by age" effects.
>>
>>
>> 2009/4/30 Emmanuel Bernard <emmanuel at hibernate.org>:
>>>
>>> What's your use case?
>>> I am not against that feature but I don't think it covers all use cases:
>>>  - I want a higher boost for more recent documents
>>>  - I change priority in my categories depending on the marketing mood
>>>
>>> Said otherwise, can have a true dynamic boost defined at query time, not
>>> at
>>> indexing time.
>>> I think Solr has something named FunctionQuery that can do that.
>>>
>>> On  Apr 29, 2009, at 16:47, Sanne Grinovero wrote:
>>>
>>>> Hello,
>>>> I'm currently needing to be able to define a per-entity INSTANCE
>>>> different Boost, not just the type.
>>>>
>>>> Currently I could obtain this functionality by using a custom
>>>> classbridge, but the entity is quite complex and building my own
>>>> classbridge I will have to map all fields myself loosing the
>>>> flexibility of annotations for the current type and
>>>> all @IndexedEmbedded.
>>>>
>>>> I'd like to add a new parameter to @Boost; currently it has a float
>>>> mandatory value,
>>>> I'd like to add a new parameter:
>>>>
>>>> Class<BoostScorer> impl();
>>>>
>>>> where BoostScorer is an interface having something like
>>>>
>>>> public float score(Object value);
>>>>
>>>> The annotation would default to an implementation returning constant
>>>> 1.0f.
>>>>
>>>> This would have an interaction with the existing "value" parameter:
>>>> IMHO they should
>>>> multiply each other, so I'd change the existing value to also have a
>>>> default of 1.0f
>>>> and people might want to change one or both values.
>>>>
>>>> Setting both values might be useful to reuse the same impl on
>>>> different types/fields
>>>> and still be able to statically scale the result from the
>>>> score(Object) function, without having to
>>>> rewrite a new implementation.
>>>>
>>>> What do you think?
>>>>
>>>> Sanne
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>
>>>
>
>




More information about the hibernate-dev mailing list