[hibernate-issues] [JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene

Friday, 6 March 2020

Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
) *commented* on HSEARCH-3856 (
https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiZjU...
)

Re: Aggregations on multi-valued numeric fields for Lucene (
https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiZjU...
)

...

 Now there are actually four aggregation options for nested documents and
 five options for flat documents. But you can add "none", or if you don't
 set it possible, all fields can be agitated without performing linking
 functions on them.

Yes, that's the plan. By default, I don't thing we should "per-document
aggregations" (sum, avg, lowest, etc.) in aggregations, so as to behave
consistently:

* Between string aggregations and numeric aggregations: we can't sun/avg/... for
strings, and lowest/highest don't make much sense for terms found in text.
* Between Lucene numeric aggregations and Elasticsearch numeric aggregations:
Elasticsearch takes into account all values by default, not the sum/avg/lowest/etc.

Also, I don't think we can request per-document sum/avg/lowest/etc. for numeric
terms/range aggregations in Elasticsearch, so we can't expose the feature in generic
APIs that both Elasticsearch and Lucene must implement. We could move it to
Lucene-specific APIs, I suppose, but there isn't really a use case, is there? You just
implemented this so that aggregations would somehow work on multi-valued fields?

...

 You can practically set the sorting option to none. But it would have to
 return as many document repetitions as the nested or duplicate values in
 the flat model field.

Yes, some documents would be counted multiple times. That's what Elasticsearch does by
default, and I think it's a decent default.

...

 Especially if paging is used.

Paging is not relevant for aggregations, which are applied on the whole index.
I don't think performance is an issue here, if that's what you're suggesting.
The problem is more that we have to move away for our "legacy" implementation of
aggregations that relied on Lucene's faceting.

Anyway, this is all something I'm suggesting to do as a second step. After your work,
sorts on multi-valued fields work correctly, and aggregations on multi-valued fields work
correctly as long as there is effectively only one value per document (which will probably
be the case once you add filtering anyway).

(
https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin...
) Add Comment (
https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin...
)

Get Jira notifications on your phone! Download the Jira Cloud app for Android (
https://play.google.com/store/apps/details?id=com.atlassian.android.jira....
) or iOS (
https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=Em...
) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:b4d24b6 )

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene