Re: [hibernate-dev] HSearch Projection queries & multiple fields with same name

Thursday, 3 January 2013

Hi Marko,
this is expected by our typical users, as you only have multiple field
values on tokenized fields, and you won't project these; occasionally
someone uses the _addFieldToDocument_ multiple times to give the
illusion of merging multiple strings to be tokenized in the same
stream, or occasionally even if you are applying an analyzer to the
field you just know for sure the output element is single, so we don't
enforce it.

Projection on the other hand can't be applied on all fields, it really
is expected on Stored fields only - and typically one stored the field
only once.

We can discuss how to improve this for your use case but I'd like to
better understand what you're needing:
I don't think you would need to change EntityInfo to List<EntityInfo>
: it still represents a *single* Document which matched your search
criteria, it looks like what you need is that one of the projected
fields is actually a multivalued element; but this would still be an
element of the same and only EntityInfo.

This implies that, since the return type of a projection is Object[],
there is no need to break any API to implement such a feature: one of
those Object elements could be a Set or an array.

Also consider there is no way to recover the multiple value in the
same order; it might seem order is maintained at a first glance but
during index reorganization (merges, optimisations) this is not
guaranteed; I'd think carefully before relying on multi-valued field
encodings as you're entering an out-of-scope usage, but if all you
need is return multiple strings that should be doable.

Sanne

On 3 January 2013 14:11, Marko Lukša <marko.luksa(a)gmail.com&gt; wrote:
...
 Hi,

 we've found the following problem regarding projection queries when
 dealing with documents containing multiple fields with the same name.

 Let's say we add field "foo" with two different values to the same
document:

 luceneOptions.addFieldToDocument("foo", "aaa", document);
 luceneOptions.addFieldToDocument("foo", "bbb", document);

 If we now do a projection query on field "foo", one would expect the
 resultset to contain exactly two results ({"aaa"}  and {"bbb"}), but
 HSearch returns only a single result (the property value of the result
 is either "aaa" or "bbb", because
Document.getFieldable("foo"), which is
 called in o.h.search.engine.impl.DocumentBuilderHelper, returns the
 first field that matches the given name).

 DocumentExtractor.extract() returns a single EntityInfo, but in order
 for it to properly handle projections as described in the previous
 paragraph, it should really be modified to return List<EntityInfo>.

 This sounds pretty reasonable when the query is projecting only a single
 field. When projecting multiple multi-valued fields, the resultset
 should actually return a cartesian product.

 This is one way of doing it.  The other way of doing it is if we
 consider multiple fields with the same name as a single multi-valued
 field. When projecting such fields, the resultset would contain the same
 number of results as there are matching documents, with the projected
 value being a collection of all the values stored in the field.

 Actually, in CapeDwarf we need the cartesian product, as this is the way
 Google AppEngine does it.

 What do you guys think?

 Marko
 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] HSearch Projection queries & multiple fields with same name