[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-133) Allow ovrriding DefaultSimilarity for indexing and searching

John Griffin (JIRA) noreply at atlassian.com
Sat Nov 17 13:47:21 EST 2007


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_28871 ] 

John Griffin commented on HSEARCH-133:
--------------------------------------

The method of Lucene scoring in the DefaultSimilarity class is:

   score(q,d) = coord(q,d) -  queryNorm(q) -  SIGMA   ( tf(t in d) -  idf(t)2 -  t.getBoost() -  norm(t,d) )
									    t in q 
Use cases:

Let's say that I don't care about what the term frequency count (tf) is in a document when it's scored. I 
only want to know if a certain term appeared at all. I would override the tf(float freq) abstract method to 
return 1.0.

Let's say that I want to increase a document's score by a factor of two if it contains more than one of the 
terms I am querying by. I would override the coord(int overlap, int maxOverlap) to return 
(overlap * 2)/maxOverlap.

As a final example I want to develop a 'ThresholdSimilarity' to increase the score of a document if and only 
if it contains more than a certain number of query terms regardless of the repository document count. Below 
that 'threshold' the scoring calculation remains the same as the default. I would override the 
idf(int docFreq, int numDocs) method to perform this calculation.


> Allow ovrriding DefaultSimilarity for indexing and searching
> ------------------------------------------------------------
>
>                 Key: HSEARCH-133
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-133
>             Project: Hibernate Search
>          Issue Type: New Feature
>          Components: engine, query
>    Affects Versions: 3.0.1
>            Reporter: John Griffin
>            Assignee: John Griffin
>
> Ability to override DefaultSimilarity for indexing and searching should be implemented. Access is necessary in both places because changing only one and then querying has undefined results as far as scores are concerned.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the hibernate-issues mailing list