[JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *created* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNmY1NGJh... ) / Improvement ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY... ) HSEARCH-3856 ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY... ) Aggregations on multi-valued numeric fields for Lucene ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY... )
Issue Type: Improvement Assignee: Unassigned Components: backend-lucene Created: 05/Mar/2020 07:59 AM Fix Versions: 6.0.0-Bonus-backlog Priority: Major Reporter: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
See how org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued is disabled due to org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields.
Before HSEARCH-3839 ( https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839 ( https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we can, but we pick a single value when aggregating, so aggregations are still incorrect.
Ideally, when counting documents per field value, multi-valued documents should be counted once per value that appears in the field. So if a single document has values 1 and 2 for a single field, it should increment the count for both 1 and 2. At least that's what happens on Elasticsearch.
How to test the behavior on Elasticsearch:
curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/ 1>&2 2>/dev/ null ; curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/\?pretty -d '{ "mappings" :{ "properties" :{ "num" :{ "type" : "integer" }}} }'
url -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d '{ "num" :1}'
curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d '{ "num" :[1,2]}'
curl -XPOST -H "Content-Type: application/json" localhost:9200/mytest1/_search\?pretty -d '{ "aggs" :{ "foo" :{ "terms" :{ "field" : "num" }}} }'
Result:
{
...
"aggregations" : {
"foo" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 1
}
]
}
}
}
So document 2 was counted twice.
( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:6148daa )
4 years, 9 months