[hibernate-dev] HSearch Projection queries & multiple fields with same name

Ales Justin ales.justin at gmail.com
Thu Jan 3 09:29:28 EST 2013


I think anything handled by not-Lucene is wrong.
As Lucene, in this case, is the only engine that can properly address all query details - order, filtering, ...

So, imo, the only way to do this is by having multiple documents, one per cartesian product element.
How to go from there, is a big TODO. :-)

-Ales

On Jan 3, 2013, at 3:17 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> I don't think it's as simple as you imply. If we go that route the
> engine atop Hibernate Search would need to know that a given field is
> multivaluable (it could be a serialized List otherwise) and look into
> each entry in the Object[] to "cartesianize it".
> 
> It's doable but it seems that it would be easier if Hibernate Search
> does the work. But I don't see that as being the default value.
> Depending on the situation you want:
> 
> - an array entry with a List in your Object[] representing a row
> - or you want n entries in your List<Object[]> with duplicated values in
>  the Object[] except for the multivalued element
> 
> Is the current behavior good for anything? We certainly did not design
> it with multivalue fields stored in mind.
> 
> So we might want to allow for setting a given value globally with ways
> to override that per association.
> 
> Thoughts?
> 
> Emmanuel
> 
> On Thu 2013-01-03 14:42, Sanne Grinovero wrote:
>> Hi Marko,
>> this is expected by our typical users, as you only have multiple field
>> values on tokenized fields, and you won't project these; occasionally
>> someone uses the _addFieldToDocument_ multiple times to give the
>> illusion of merging multiple strings to be tokenized in the same
>> stream, or occasionally even if you are applying an analyzer to the
>> field you just know for sure the output element is single, so we don't
>> enforce it.
>> 
>> Projection on the other hand can't be applied on all fields, it really
>> is expected on Stored fields only - and typically one stored the field
>> only once.
>> 
>> We can discuss how to improve this for your use case but I'd like to
>> better understand what you're needing:
>> I don't think you would need to change EntityInfo to List<EntityInfo>
>> : it still represents a *single* Document which matched your search
>> criteria, it looks like what you need is that one of the projected
>> fields is actually a multivalued element; but this would still be an
>> element of the same and only EntityInfo.
>> 
>> This implies that, since the return type of a projection is Object[],
>> there is no need to break any API to implement such a feature: one of
>> those Object elements could be a Set or an array.
>> 
>> Also consider there is no way to recover the multiple value in the
>> same order; it might seem order is maintained at a first glance but
>> during index reorganization (merges, optimisations) this is not
>> guaranteed; I'd think carefully before relying on multi-valued field
>> encodings as you're entering an out-of-scope usage, but if all you
>> need is return multiple strings that should be doable.
>> 
>> Sanne
>> 
>> On 3 January 2013 14:11, Marko Lukša <marko.luksa at gmail.com> wrote:
>>> Hi,
>>> 
>>> we've found the following problem regarding projection queries when
>>> dealing with documents containing multiple fields with the same name.
>>> 
>>> Let's say we add field "foo" with two different values to the same document:
>>> 
>>> luceneOptions.addFieldToDocument("foo", "aaa", document);
>>> luceneOptions.addFieldToDocument("foo", "bbb", document);
>>> 
>>> If we now do a projection query on field "foo", one would expect the
>>> resultset to contain exactly two results ({"aaa"}  and {"bbb"}), but
>>> HSearch returns only a single result (the property value of the result
>>> is either "aaa" or "bbb", because Document.getFieldable("foo"), which is
>>> called in o.h.search.engine.impl.DocumentBuilderHelper, returns the
>>> first field that matches the given name).
>>> 
>>> DocumentExtractor.extract() returns a single EntityInfo, but in order
>>> for it to properly handle projections as described in the previous
>>> paragraph, it should really be modified to return List<EntityInfo>.
>>> 
>>> This sounds pretty reasonable when the query is projecting only a single
>>> field. When projecting multiple multi-valued fields, the resultset
>>> should actually return a cartesian product.
>>> 
>>> This is one way of doing it.  The other way of doing it is if we
>>> consider multiple fields with the same name as a single multi-valued
>>> field. When projecting such fields, the resultset would contain the same
>>> number of results as there are matching documents, with the projected
>>> value being a collection of all the values stored in the field.
>>> 
>>> Actually, in CapeDwarf we need the cartesian product, as this is the way
>>> Google AppEngine does it.
>>> 
>>> What do you guys think?
>>> 
>>> Marko
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> 
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev




More information about the hibernate-dev mailing list