[
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-133?pag...
]
John Griffin commented on HSEARCH-133:
--------------------------------------
The method of Lucene scoring in the DefaultSimilarity class is:
score(q,d) = coord(q,d) - queryNorm(q) - SIGMA ( tf(t in d) - idf(t)2 -
t.getBoost() - norm(t,d) )
t in q
Use cases:
Let's say that I don't care about what the term frequency count (tf) is in a
document when it's scored. I
only want to know if a certain term appeared at all. I would override the tf(float freq)
abstract method to
return 1.0.
Let's say that I want to increase a document's score by a factor of two if it
contains more than one of the
terms I am querying by. I would override the coord(int overlap, int maxOverlap) to return
(overlap * 2)/maxOverlap.
As a final example I want to develop a 'ThresholdSimilarity' to increase the score
of a document if and only
if it contains more than a certain number of query terms regardless of the repository
document count. Below
that 'threshold' the scoring calculation remains the same as the default. I would
override the
idf(int docFreq, int numDocs) method to perform this calculation.
Allow ovrriding DefaultSimilarity for indexing and searching
------------------------------------------------------------
Key: HSEARCH-133
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-133
Project: Hibernate Search
Issue Type: New Feature
Components: engine, query
Affects Versions: 3.0.1
Reporter: John Griffin
Assignee: John Griffin
Ability to override DefaultSimilarity for indexing and searching should be implemented.
Access is necessary in both places because changing only one and then querying has
undefined results as far as scores are concerned.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira