[hibernate-issues] [JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene

Yoann Rodière (JIRA) jira at hibernate.atlassian.net
Fri Mar 6 02:43:04 EST 2020


Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b ) *updated* an issue

Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 ) / Improvement ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 ) HSEARCH-3856 ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 ) Aggregations on multi-valued numeric fields for Lucene ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 )

Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b )

See how  {{ org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued }}  is disabled due to  {{ org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields }} .

Before HSEARCH-3839, we couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839, we can, but we pick a single value when aggregating, so aggregations are still incorrect.

Ideally, when counting documents per field value, multi-valued documents should be counted once per value that appears in the field. So if a single document has values  {{ 1 }}  and  {{ 2 }}  for a single field, it should increment the count for both  {{ 1 }}  and  {{ 2 }} . At least that's what happens on Elasticsearch.

How to test the behavior on Elasticsearch:

{code}
curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/ 1>&2 2>/dev/null; curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/\?pretty -d'{"mappings":{"properties":{"num":{"type":"integer" }}} }'
url curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d'{"num":1}'
curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d'{"num":[1,2]}'
curl -XPOST -H "Content-Type: application/json" localhost:9200/mytest1/_search\?pretty -d'{"aggs":{"foo":{"terms":{"field":"num" }}} }'
{code}

Result:

{noformat}
{
...
"aggregations" : {
"foo" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 1
}
]
}
}
}
{noformat}

So document 2 was counted twice.

( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiNzljMzU4MDE4ZjM3NGQ3OWEwZjJlYzg2YmViMmI0MWYiLCJwIjoiaiJ9 )

Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.core&referrer=utm_source%3DNotificationLink%26utm_medium%3DEmail ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNotificationLink&mt=8 ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:b4d24b6 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-issues/attachments/20200306/9284e971/attachment.html 


More information about the hibernate-issues mailing list