Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
) *created* an issue
Hibernate Search (
https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNmY1NGJh...
) / Improvement (
https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY...
) HSEARCH-3856 (
https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY...
) Aggregations on multi-valued numeric fields for Lucene (
https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY...
)
Issue Type: Improvement Assignee: Unassigned Components: backend-lucene Created:
05/Mar/2020 07:59 AM Fix Versions: 6.0.0-Bonus-backlog Priority: Major Reporter: Yoann
Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
)
See how
org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued
is disabled due to
org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields.
Before HSEARCH-3839 (
https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we
couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839 (
https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we can, but we pick a single
value when aggregating, so aggregations are still incorrect.
Ideally, when counting documents per field value, multi-valued documents should be counted
once per value that appears in the field. So if a single document has values 1 and 2 for a
single field, it should increment the count for both 1 and 2. At least that's what
happens on Elasticsearch.
How to test the behavior on Elasticsearch:
curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/
1>&2 2>/dev/ null ; curl -XPUT -H "Content-Type: application/json"
localhost:9200/mytest1/\?pretty -d '{ "mappings" :{ "properties"
:{ "num" :{ "type" : "integer" }}} }'
url -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d
'{ "num" :1}'
curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d
'{ "num" :[1,2]}'
curl -XPOST -H "Content-Type: application/json"
localhost:9200/mytest1/_search\?pretty -d '{ "aggs" :{ "foo" :{
"terms" :{ "field" : "num" }}} }'
Result:
{
...
"aggregations" : {
"foo" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 1
}
]
}
}
}
So document 2 was counted twice.
(
https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin...
) Add Comment (
https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin...
)
Get Jira notifications on your phone! Download the Jira Cloud app for Android (
https://play.google.com/store/apps/details?id=com.atlassian.android.jira....
) or iOS (
https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=Em...
) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:6148daa )