[hibernate-dev] Search: Dynamic Document boosting

Thu Apr 30 12:27:34 EDT 2009

On  Apr 30, 2009, at 14:11, Sanne Grinovero wrote:

> 2009/4/30 Emmanuel Bernard <emmanuel at hibernate.org>:
>>
>> On  Apr 30, 2009, at 13:01, Sanne Grinovero wrote:
>>
>>> Basically I need a function to convert a user-proposed term to a
>>> series of proposals
>>> of "similar" terms but giving a higher rank to the terms I'd prefer
>>> him to choose as they
>>> are the correct names used in my domain.
>>> You can think of it as a spellchecker/dictionary (using synonyms
>>> toos), but giving priority to
>>> a selected form of each term, known as the "root", or the standard
>>> form in the domain.
>>
>> Have you considered indexing the same property twice:
>>  - once unaltered and giving it a high boost
>>  - once with synonyms and giving it a low boost
>>
>> That's how I would sove your use case personally.
>>
>
> That would be clever if I had only two levels of boost, but it's not  
> the case.
> Also most properties are being indexed already in 5+n different  
> fields,
> being n the number of supported languages for snowball (currently 15,
> so the next step
> will be to try the programmatic configuration to remove all this  
> annotations).
> So adding more fields will drammatically increase the number of them
> (number of boost
> levels * (5 + number of enabled stemmers))

number of boost? the level of boost is defined at query time so I  
guess there would be the clean data and the synonym data ie 2 * (5 +  
languages) right?

>
> and make the index size unnecessarily large, and I'm not solving the
> time-fading requirement.

Well yes but dynamic boosting as defined by you does not solve the  
time fading requirement either right?
FunctionQuery does (or can)

>
> My BI suite is returning some nice float which I'm storing in the
> entity itself as a property,
> it would make my life a lot easier if I could just use this float
> value as the document boost.

I'd rather see something more generic like the @AnalyzerDiscriminator  
approach we've used if really really you want to set that at indexing  
time.