[hibernate-issues] [JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene

Yoann Rodière (JIRA) jira at hibernate.atlassian.net
Thu Mar 5 10:59:40 EST 2020


Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b ) *created* an issue

Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 ) / Improvement ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 ) HSEARCH-3856 ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 ) Aggregations on multi-valued numeric fields for Lucene ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 )

Issue Type: Improvement Assignee: Unassigned Components: backend-lucene Created: 05/Mar/2020 07:59 AM Fix Versions: 6.0.0-Bonus-backlog Priority: Major Reporter: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b )

See how org.hibernate.search.integrationtest.backend.tck.search.aggregation.SingleFieldAggregationBaseIT#multiValued is disabled due to org.hibernate.search.integrationtest.backend.lucene.testsupport.util.LuceneTckBackendFeatures#aggregationsOnMultiValuedFields.

Before HSEARCH-3839 ( https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we couldn't even index multiple values for numeric fields in Lucene. After HSEARCH-3839 ( https://hibernate.atlassian.net/browse/HSEARCH-3839 ) Open , we can, but we pick a single value when aggregating, so aggregations are still incorrect.

Ideally, when counting documents per field value, multi-valued documents should be counted once per value that appears in the field. So if a single document has values 1 and 2 for a single field, it should increment the count for both 1 and 2. At least that's what happens on Elasticsearch.

How to test the behavior on Elasticsearch:

curl -XDELETE -H "Content-Type: application/json" localhost:9200/mytest1/ 1>&2 2>/dev/ null ; curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/\?pretty -d '{ "mappings" :{ "properties" :{ "num" :{ "type" : "integer"  }}} }'
url -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/1 -d '{ "num" :1}'
curl -XPUT -H "Content-Type: application/json" localhost:9200/mytest1/_doc/2 -d '{ "num" :[1,2]}'
curl -XPOST -H "Content-Type: application/json" localhost:9200/mytest1/_search\?pretty -d '{ "aggs" :{ "foo" :{ "terms" :{ "field" : "num"  }}} }'

Result:

{
 ...
 "aggregations" : {
   "foo" : {
     "doc_count_error_upper_bound" : 0,
     "sum_other_doc_count" : 0,
     "buckets" : [
       {
         "key" : 1,
         "doc_count" : 2
       },
       {
         "key" : 2,
         "doc_count" : 1
       }
     ]
   }
 }
}

So document 2 was counted twice.

( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiNmY1NGJhOWJiODU3NGM4NzhlZjgwMzQxMTFlYmM4ODIiLCJwIjoiaiJ9 )

Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.core&referrer=utm_source%3DNotificationLink%26utm_medium%3DEmail ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNotificationLink&mt=8 ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:6148daa )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-issues/attachments/20200305/e15bcd16/attachment.html 


More information about the hibernate-issues mailing list