Hi,
we've found the following problem regarding projection queries when
dealing with documents containing multiple fields with the same name.
Let's say we add field "foo" with two different values to the same
document:
luceneOptions.addFieldToDocument("foo", "aaa", document);
luceneOptions.addFieldToDocument("foo", "bbb", document);
If we now do a projection query on field "foo", one would expect the
resultset to contain exactly two results ({"aaa"} and {"bbb"}), but
HSearch returns only a single result (the property value of the result
is either "aaa" or "bbb", because
Document.getFieldable("foo"), which is
called in o.h.search.engine.impl.DocumentBuilderHelper, returns the
first field that matches the given name).
DocumentExtractor.extract() returns a single EntityInfo, but in order
for it to properly handle projections as described in the previous
paragraph, it should really be modified to return List<EntityInfo>.
This sounds pretty reasonable when the query is projecting only a single
field. When projecting multiple multi-valued fields, the resultset
should actually return a cartesian product.
This is one way of doing it. The other way of doing it is if we
consider multiple fields with the same name as a single multi-valued
field. When projecting such fields, the resultset would contain the same
number of results as there are matching documents, with the projected
value being a collection of all the values stored in the field.
Actually, in CapeDwarf we need the cartesian product, as this is the way
Google AppEngine does it.
What do you guys think?
Marko