[hibernate-dev] Hibernate Search 3.1

Wed Feb 27 10:51:15 EST 2008

On 27/02/2008, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>
>  On  Feb 27, 2008, at 08:40, Nick Vincent wrote:
>
>  > Hi Emmanuel,
>  >
>  > On 26/02/2008, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>  >>
>  >>  On  Feb 26, 2008, at 06:41, Nick Vincent wrote:
>  >>>
>  >>>
>  >>> 2) Explaining results
>  >>>
>  >>> This uses the new DOCUMENT_ID projection introduced in 3.0.1  to
>  >>> explain query results (we need this so the customer can understand
>  >>> their search results in the backoffice interface).  I added an
>  >>> explain
>  >>> method to both implementations of FullTextQueryImpl which is only
>  >>> available by casting (e.g. no interface changes).  I think explain()
>  >>> is probably a fairly advanced function which it's acceptable to
>  >>> access
>  >>> by casting.
>  >>
>  >>
>  >> Wouldn't it make sense to expose the explain result (I imagine an
>  >>  Explanation object) as a projected field?
>  >
>  > The Lucene javadoc says "Computing an explanation is as expensive as
>  > executing the query over the entire index.".
>  >
>  > http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/
>  > search/Searcher.html#explain(org.apache.lucene.search.Query,%20int)
>  >
>  > which is why I didn't consider projecting this.  If that's true the
>  > effort to project an explanation onto the results will increase
>  > exponentially with the number of hits.  For this reason I think the
>  > method of accessing an Explanation that I proposed is reasonable
>  > (although not necessarily right).
>
>
> My concern really is that the reader used to explain might be
>  different than the reader that returns hits, and hence be out of
>  sync. But the projection idea seems like too resource intensive.
>
>  How is your use case then? The user ask for the explanation of a
>  single result manually after the query? (ie there is a human think
>  time between he query and the explanation?)

Yes, our use case is that our customer will test a search on their
page and then say "why does this particular result rank higher than
the rest".  We then need to provide an explanation (preferably without
them having to ring us and ask).  The think time should be fairly
short and it's only for a backoffice system so it's not 100% critical
it's correct, although it would be nice if it were.

>  >>> 3) Counting results
>  >>>
>  >>> In the current implementation we only want to perform one Lucene
>  >>> query
>  >>> per search (all projected).  In order to get a resultcount and the
>  >>> results themselves it is currently necessary to invoke the Lucene
>  >>> query twice.
>  >>
>  >>
>  >> This is not true.
>  >>
>  >>  query.list(); //triggers a lucene query
>  >>  query.getResultSize(); //does not since list() has already
>  >> computed it
>  >
>  > You are right, and I don't need to make any alterations.  I've worked
>  > out what the problem we encountered was that made me think this was a
>  > problem.  It took a bit of digging around the source to work out what
>  > we'd done wrong, and perhaps it might be useful to include in an FAQ
>  > or the documentation.  If you make the calls in this order:
>  >
>  > query.getResultSize();  // Hits retrieved, hitcount cached and
>  > returned
>  > query.list(); // Hits retrieved
>  >
>  > then the query gets run twice as resultCount is cached in
>  > FullTextQueryImpl but the Hits object is not.
>  >
>  > A subtle effect, but when you're using something like JSF you're not
>  > always sure in which order the properties of your underlying beans are
>  > retrieved during the render cycle.  This was the cause of our double
>  > querying behaviour.
>
>
> http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-157
>
>  I don't keep the hit around because it would mean keeping the readers
>  opened. I guess a simple helper class could do what your code was
>  doing (ie build a result size aware list).

It's probably not necessary provided the information about correct
usage is readily available.  We have done pretty much that and created
a bean which when asked for the hitcount runs the query and caches
both the results and the total hitcount, which alleviates the problem
for us.

Nick