[hibernate-dev] HSearch Projection queries & multiple fields with same name

Sanne Grinovero sanne at hibernate.org
Thu Jan 3 10:00:33 EST 2013


On 3 January 2013 15:29, Ales Justin <ales.justin at gmail.com> wrote:
> I think anything handled by not-Lucene is wrong.

I'm afraid Lucene won't do it, so we have no option. It's definitely
not designed to do this: even a custom Collector can't return more
results than Documents in its segment, as all representations work by
using int as relative ids.

> As Lucene, in this case, is the only engine that can properly address all query details - order, filtering, ...

Filtering for example will not be able to exclude some rows stemming
from the same Document, it's going to be all/nothing for each
Document.

>
> So, imo, the only way to do this is by having multiple documents, one per cartesian product element.
> How to go from there, is a big TODO. :-)
>
> -Ales
>
> On Jan 3, 2013, at 3:17 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>
>> I don't think it's as simple as you imply. If we go that route the
>> engine atop Hibernate Search would need to know that a given field is
>> multivaluable (it could be a serialized List otherwise) and look into
>> each entry in the Object[] to "cartesianize it".

Right, I'm assuming indeed that the engine atop has a specific
knowledge of the model: after all it applied this indexing strategy.
The DocumentExtractor helps a bit: if you apply a TwoWayFieldBridge on
the collection, Search will help by applying it to the projected
result transparently and return the collection of elements.
Question is how far we should go with "more help", and at which level
it is more appropriate to apply this. I don't think it's the document
extraction phase, this is more something that should happen during the
iteration phase, which also happens to be where we transform
EntityInfo elements into user-facing resultsets: we have a chance
there to decode the EntityInfo elements in a different way.

>>
>> It's doable but it seems that it would be easier if Hibernate Search
>> does the work. But I don't see that as being the default value.
>> Depending on the situation you want:
>>
>> - an array entry with a List in your Object[] representing a row
>> - or you want n entries in your List<Object[]> with duplicated values in
>>  the Object[] except for the multivalued element
>>
>> Is the current behavior good for anything? We certainly did not design
>> it with multivalue fields stored in mind.
>>
>> So we might want to allow for setting a given value globally with ways
>> to override that per association.

The same association might be targeted by different queries: I'd
rather expect such an option expressed on FullTextQuery.
We might then even detect if filters are applied, as I think that
would need to be considered illegal.

Sanne

>>
>> Thoughts?
>>
>> Emmanuel
>>
>> On Thu 2013-01-03 14:42, Sanne Grinovero wrote:
>>> Hi Marko,
>>> this is expected by our typical users, as you only have multiple field
>>> values on tokenized fields, and you won't project these; occasionally
>>> someone uses the _addFieldToDocument_ multiple times to give the
>>> illusion of merging multiple strings to be tokenized in the same
>>> stream, or occasionally even if you are applying an analyzer to the
>>> field you just know for sure the output element is single, so we don't
>>> enforce it.
>>>
>>> Projection on the other hand can't be applied on all fields, it really
>>> is expected on Stored fields only - and typically one stored the field
>>> only once.
>>>
>>> We can discuss how to improve this for your use case but I'd like to
>>> better understand what you're needing:
>>> I don't think you would need to change EntityInfo to List<EntityInfo>
>>> : it still represents a *single* Document which matched your search
>>> criteria, it looks like what you need is that one of the projected
>>> fields is actually a multivalued element; but this would still be an
>>> element of the same and only EntityInfo.
>>>
>>> This implies that, since the return type of a projection is Object[],
>>> there is no need to break any API to implement such a feature: one of
>>> those Object elements could be a Set or an array.
>>>
>>> Also consider there is no way to recover the multiple value in the
>>> same order; it might seem order is maintained at a first glance but
>>> during index reorganization (merges, optimisations) this is not
>>> guaranteed; I'd think carefully before relying on multi-valued field
>>> encodings as you're entering an out-of-scope usage, but if all you
>>> need is return multiple strings that should be doable.
>>>
>>> Sanne
>>>
>>> On 3 January 2013 14:11, Marko Lukša <marko.luksa at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> we've found the following problem regarding projection queries when
>>>> dealing with documents containing multiple fields with the same name.
>>>>
>>>> Let's say we add field "foo" with two different values to the same document:
>>>>
>>>> luceneOptions.addFieldToDocument("foo", "aaa", document);
>>>> luceneOptions.addFieldToDocument("foo", "bbb", document);
>>>>
>>>> If we now do a projection query on field "foo", one would expect the
>>>> resultset to contain exactly two results ({"aaa"}  and {"bbb"}), but
>>>> HSearch returns only a single result (the property value of the result
>>>> is either "aaa" or "bbb", because Document.getFieldable("foo"), which is
>>>> called in o.h.search.engine.impl.DocumentBuilderHelper, returns the
>>>> first field that matches the given name).
>>>>
>>>> DocumentExtractor.extract() returns a single EntityInfo, but in order
>>>> for it to properly handle projections as described in the previous
>>>> paragraph, it should really be modified to return List<EntityInfo>.
>>>>
>>>> This sounds pretty reasonable when the query is projecting only a single
>>>> field. When projecting multiple multi-valued fields, the resultset
>>>> should actually return a cartesian product.
>>>>
>>>> This is one way of doing it.  The other way of doing it is if we
>>>> consider multiple fields with the same name as a single multi-valued
>>>> field. When projecting such fields, the resultset would contain the same
>>>> number of results as there are matching documents, with the projected
>>>> value being a collection of all the values stored in the field.
>>>>
>>>> Actually, in CapeDwarf we need the cartesian product, as this is the way
>>>> Google AppEngine does it.
>>>>
>>>> What do you guys think?
>>>>
>>>> Marko
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev



More information about the hibernate-dev mailing list