See how org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued is disabled due to org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields. Before HSEARCH-3839 Open , we couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839 Open , we can, but we pick a single value when aggregating, so aggregations are still incorrect. Ideally, when counting documents per field value, multi-valued documents should be counted once per value that appears in the field. So if a single document has values 1 and 2 for a single field, it should increment the count for both 1 and 2. At least that's what happens on Elasticsearch. How to test the behavior on Elasticsearch:
curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/ 1>&2 2>/dev/null; curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/\?pretty -d'{"mappings":{"properties":{"num":{"type":"integer" }}} }'
url -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d'{"num":1}'
curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d'{"num":[1,2]}'
curl -XPOST -H "Content-Type: application/json" localhost:9200/mytest1/_search\?pretty -d'{"aggs":{"foo":{"terms":{"field":"num" }}} }'
Result:
So document 2 was counted twice. |