[hibernate-dev] [HSearch] Faceting feedback

Mon Feb 28 13:56:59 EST 2011

Hi Hardy,
great job!
I've been reading this focusing on the tests and the Collectors implementation.
If you want me to have a look at all the patches, could you please
squash some commits :)

Comments:
1) why is it named SimpleFacetRequest, are you planning for a more advanced one?

2) I couldn't find a test doing a faceting on more than one field,
still it looks like the Collector and FieldCacheContainer
are prepared to deal with that.
You know the collector is a very hot area for the performance of a
Query, especially at
"void collect(int doc)"

while working on the FieldCache support I also started with similar
Map<Field, containers>, but then I realized that
the number of fields I'd work on is definitely limited, so I removed
all Map lookups (and all puts and foreach's) to support
only a single field per collector instance, so you can specialize the
single instance and eventually chain them up when
more than a field is being requested.
A side-effect is that the code turns out a bit simpler, and you can
save some memory on "per field" containers as they can
sometimes be stored in simpler types, I had some nested maps converted
in arrays; maybe that's not your case
but it might be worth trying.

In case you support faceting on multiple facets/fields, how are you
going to specify the sort order?
I'm not suggesting you should, I'd rather remove support for it.

Some more comments inline:

2011/2/28 Hardy Ferentschik <hibernate at ferentschik.de>:
>
> Hi,
>
> I thought it would be great to get some feedback on my faceting work.
> You can see the latest on my Search fork -
> https://github.com/hferentschik/hibernate-search/commits/HSEARCH-667
>
> Technically I decided to use a simple custom Collector. I abandoned the
> idea for using bobo browse, since it did
> not really seem to fit our architecture and I am not sure how well
> maintained the code is.

Seems reasonable, same thing I was thinking when you talked about it,
but I trust you had more insight.
Is there any non-trivial faceting that we would miss this way?
The examples I know of are quite simple to implement; generally
speaking it seems to be more an API and integration issue
so we likely have our own itches to scratch anyway.

> Within the custom Collector I am using Lucene's FieldCache to cache and
> collect the count values during facting
> (as a reminder, faceting for example means that I am searching for all
> cars or a certain make, but then also want
> to group the matching cars into their cubic capacity. See also -
> http://en.wikipedia.org/wiki/Faceted_search)
> Using the FieldCache is quite memory costly, but there are other ways to
> implement the faceting itself.

Are you relying on fieldCache only, or do you have an alternative way too?
I'm just wondering if alternatives are possible, as fieldCaches seem
very expensive memory-wise.

>
> At the moment I am mostly interested in the feedback around the public
> API. The public classes can be found in
> the package org.hibernate.search.query.facet -
> https://github.com/hferentschik/hibernate-search/tree/3a9877e2bbc47a8bd6e438dbbab56d93ac9101fd/hibernate-search/src/main/java/org/hibernate/search/query/facet
>
> The idea is to write a fulltext query as usual and then add/enable a facet:
>
> FacetRequest request = new SimpleFacetRequest( indexFieldName,
> FacetSortOrder.COUNT_DESC, false );
> TermQuery term = new TermQuery( new Term( "make", "honda" ) );
> FullTextQuery query = fullTextSession.createFullTextQuery( term, Car.class
> );
> query.enableQueryFacet( "foo", request );

Wouldn't it make sense to have a
fullTextSession.createFacetingQuery( LuceneQuery, facetName, targetTypes );
?
At least people won't need to cast the return type to your special container.

> Then you run the query. This will enable the facet collector and after the
> query executed you get access to a map
> which maps FacetResults to the facet name. Each FacetResult contains a
> list of Facets which contain the actually
> field values and counts:
>
> Map<String, FacetResult> results = query.getFacetResults();
> FacetResult facetResult = results.get( "foo" );
> List<Facet> facetList = facetResult.getFacets();
> assertEquals( "Wrong facet count for facet ", 100, facetList.get( 0
> ).getCount() );

Looks great.
Would it be possible (in future maybe) to return managed entities /
selected projection instead of their frequency only?
I don't think we should do that for next Alpha, but it would be great
to have lazily loaded proxies of the contents of each facet, as
usability I'd expect people to "expand" on the facet to show the
actual results,
which are likely going to be the usual entities, or some projection.

>
> More actual tests can be found here -
> https://github.com/hferentschik/hibernate-search/tree/3a9877e2bbc47a8bd6e438dbbab56d93ac9101fd/hibernate-search/src/test/java/org/hibernate/search/test/query/facet
>
> At the moment you are able to facet on simple (string) based values or on
> number ranges (eg price ranges 0 - 100, ...). For that I have created
> subclasses of
> FacetRequest - SimpleFacetRequest and RangeFacetRequest (a
> DateRangeFacetRequest might be interesting as well)

Nice.
Do you think we need the user to bother for the field type? don't you
have it on the FieldBridge?
I'm wondering, as I'm stuck at this point with my own patches :)

>
> Some concrete questions:
> * Atm, I am only exposing a programmatic API for creating FacetRequests. I
> guess we want to have annotations for this as well, right?

I wasn't expecting that. You have an example of what you where thinking?

>   Would we keep the programmatic configuration as a public API?
> * I made the FacetRequest classes immutable atm, but this way I have a
> multitude of constructors catering for a whole range of parameters
>   (sort order, include zero counts, ...). Any opinions around immutable
> objects vs objects with setters for configuring options after creation.

I usually favour immutable objects only when creating them is (very)
expensive and so you'd like
to reuse them in other threads having different parameters. Where you
thinking of a FacetRequest
as something "define once, run multiple times", like a named query?
I'd give priority to the performance of the Collector, if you have to choose.

>
> If course I am interested in any other feedback as well.
>
> --Hardy

Sanne