See how {{ org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued }} is disabled due to {{ org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields }} .
Before HSEARCH-3839, we couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839, we can, but we pick a single value when aggregating, so aggregations are still incorrect.
Ideally, when counting documents per field value, multi-valued documents should be counted once per value that appears in the field. So if a single document has values {{ 1 }} and {{ 2 }} for a single field, it should increment the count for both {{ 1 }} and {{ 2 }} . At least that's what happens on Elasticsearch.
How to test the behavior on Elasticsearch:
{code} curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/ 1>&2 2>/dev/null; curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/\?pretty -d'{"mappings":{"properties":{"num":{"type":"integer" }}} }' url curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d'{"num":1}' curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d'{"num":[1,2]}' curl -XPOST -H "Content-Type: application/json" localhost:9200/mytest1/_search\?pretty -d'{"aggs":{"foo":{"terms":{"field":"num" }}} }' {code}
Result:
{noformat} { ... "aggregations" : { "foo" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 1, "doc_count" : 2 }, { "key" : 2, "doc_count" : 1 } ] } } } {noformat}
So document 2 was counted twice. |
|