[hibernate-dev] [HSearch] Faceting feedback

Mon Feb 28 15:33:15 EST 2011

Thanks for the feedback Sanne!

On Mon, 28 Feb 2011 19:56:59 +0100, Sanne Grinovero <sanne at hibernate.org>  
wrote:

> I've been reading this focusing on the tests and the Collectors  
> implementation.

That's the most interesting part. There are a few unrelated cleanup and  
testing
related commits which as you say I should probably squash. I probably  
should go through
the commits tomorrow and sort things out a little. Maybe we could even  
pull in
this initial cut to have a common ground to work from again.

> Comments:
> 1) why is it named SimpleFacetRequest, are you planning for a more  
> advanced one?

SimpleFacetRequest is probably a bad name. Maybe ValueFacetRequest would  
be better.
It is basically just asking for the different field values and their count  
for the faceted
field. In contrast RangeFacetRequest groups counts into numeric ranges. As  
mentioned
another subclass could be DateRangeFacetRequest

> 2) I couldn't find a test doing a faceting on more than one field,
> still it looks like the Collector and FieldCacheContainer
> are prepared to deal with that.

It depends what you mean. I think there should be a test with multiple  
in-depended
facets, eg facet on engine size and color in the car example.
There is also the use case pivoting where you set multiple fields into  
relation to
each other. For example you facet on engine size and within that you facet  
further on color.
I haven't implemented this usecase yet. Not sure if we should add this  
feature right away.

> while working on the FieldCache support I also started with similar
> Map<Field, containers>, but then I realized that
> the number of fields I'd work on is definitely limited, so I removed
> all Map lookups (and all puts and foreach's) to support
> only a single field per collector instance, so you can specialize the
> single instance and eventually chain them up when
> more than a field is being requested.

Hmm, that's an interesting idea. Instead of using maps you basically have  
one collector per
facet request. This idea is definitely worth exploring. It might even  
clean out some of the code.
Do you have any experience what this really brings runtime wise? But I  
will definitely explore your idea.

> In case you support faceting on multiple facets/fields, how are you
> going to specify the sort order?
> I'm not suggesting you should, I'd rather remove support for it.

Not sure whether you are talking about pivoting here. In the case you are  
using multiple in-dependent
facet requests, each facet result is ordered by itself. I think having a  
sort order is important, at least
you should be able to either sort by count or by value

> Are you relying on fieldCache only, or do you have an alternative way  
> too? I'm just wondering if alternatives are possible, as fieldCaches seem
> very expensive memory-wise.

Yes, the current implementation is based on field caches. As you say, the  
main concern is memory in this cases.
There is an alternative I haven't explored yet working with TermEnum and  
doc sets intersections.

> Wouldn't it make sense to have a
> fullTextSession.createFacetingQuery( LuceneQuery, facetName, targetTypes  
> );
> ?
> At least people won't need to cast the return type to your special  
> container.

Not sure if I follow here. Faceting is for me more like filtering. You en-  
and disable
facets before running the query. You still need the main result list, but  
also a way to access the
facet results.

>> Map<String, FacetResult> results = query.getFacetResults();
>> FacetResult facetResult = results.get( "foo" );
>> List<Facet> facetList = facetResult.getFacets();
>> assertEquals( "Wrong facet count for facet ", 100, facetList.get( 0
>> ).getCount() );
>
> Looks great.
> Would it be possible (in future maybe) to return managed entities /
> selected projection instead of their frequency only?

That's an awesome idea. This feature could set our faceting apart from the  
just count based approach

> I don't think we should do that for next Alpha, but it would be great
> to have lazily loaded proxies of the contents of each facet, as
> usability I'd expect people to "expand" on the facet to show the
> actual results,
> which are likely going to be the usual entities, or some projection.

One thing I want to add is the ability to pass in the current query into a  
Facet and get returned a BooleanQuery
which combines the two.

> Nice.
> Do you think we need the user to bother for the field type? don't you
> have it on the FieldBridge?
> I'm wondering, as I'm stuck at this point with my own patches :)

Right. It's on my todo list. In fact I am missing at the moment the meta  
data api
we have been talking about to "reflect" on the field configuration. I need
to check what is possible atm.

>>   Would we keep the programmatic configuration as a public API?
>> * I made the FacetRequest classes immutable atm, but this way I have a
>> multitude of constructors catering for a whole range of parameters
>>   (sort order, include zero counts, ...). Any opinions around immutable
>> objects vs objects with setters for configuring options after creation.
>
> I usually favour immutable objects only when creating them is (very)
> expensive and so you'd like to reuse them in other threads having  
> different parameters.
> Where you thinking of a FacetRequest
> as something "define once, run multiple times", like a named query?

right. I think I lean towards mutable request objects in this case. I  
wanted to see where thigs
are going with this immutable request object, but I think it is not worth  
it in this case

> I'd give priority to the performance of the Collector, if you have to  
> choose.

Sure

Thanks again,

--Hardy