[hibernate-issues] [JIRA] (HSEARCH-3856) Aggregations on multi-valued numeric fields for Lucene

Yoann Rodière (JIRA) jira at hibernate.atlassian.net
Fri Mar 6 02:42:36 EST 2020


Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b ) *commented* on HSEARCH-3856 ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiZjUyYzRjOWVkYTZlNGZkMGI5YjRlMmI2ZjNlNTcwODIiLCJwIjoiaiJ9 )

Re: Aggregations on multi-valued numeric fields for Lucene ( https://hibernate.atlassian.net/browse/HSEARCH-3856?atlOrigin=eyJpIjoiZjUyYzRjOWVkYTZlNGZkMGI5YjRlMmI2ZjNlNTcwODIiLCJwIjoiaiJ9 )

> 
> 
> 
> Now there are actually four aggregation options for nested documents and
> five options for flat documents. But you can add "none", or if you don't
> set it possible, all fields can be agitated without performing linking
> functions on them.
> 
> 

Yes, that's the plan. By default, I don't thing we should "per-document aggregations" (sum, avg, lowest, etc.) in aggregations, so as to behave consistently:

* Between string aggregations and numeric aggregations: we can't sun/avg/... for strings, and lowest/highest don't make much sense for terms found in text.
* Between Lucene numeric aggregations and Elasticsearch numeric aggregations: Elasticsearch takes into account all values by default, not the sum/avg/lowest/etc.

Also, I don't think we can request per-document sum/avg/lowest/etc. for numeric terms/range aggregations in Elasticsearch, so we can't expose the feature in generic APIs that both Elasticsearch and Lucene must implement. We could move it to Lucene-specific APIs, I suppose, but there isn't really a use case, is there? You just implemented this so that aggregations would somehow work on multi-valued fields?

> 
> 
> 
> You can practically set the sorting option to none. But it would have to
> return as many document repetitions as the nested or duplicate values in
> the flat model field.
> 
> 

Yes, some documents would be counted multiple times. That's what Elasticsearch does by default, and I think it's a decent default.

> 
> 
> 
> Especially if paging is used.
> 
> 

Paging is not relevant for aggregations, which are applied on the whole index.
I don't think performance is an issue here, if that's what you're suggesting. The problem is more that we have to move away for our "legacy" implementation of aggregations that relied on Lucene's faceting.

Anyway, this is all something I'm suggesting to do as a second step. After your work, sorts on multi-valued fields work correctly, and aggregations on multi-valued fields work correctly as long as there is effectively only one value per document (which will probably be the case once you add filtering anyway).

( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiZjUyYzRjOWVkYTZlNGZkMGI5YjRlMmI2ZjNlNTcwODIiLCJwIjoiaiJ9 ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3856#add-comment?atlOrigin=eyJpIjoiZjUyYzRjOWVkYTZlNGZkMGI5YjRlMmI2ZjNlNTcwODIiLCJwIjoiaiJ9 )

Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.core&referrer=utm_source%3DNotificationLink%26utm_medium%3DEmail ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNotificationLink&mt=8 ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100121- sha1:b4d24b6 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-issues/attachments/20200306/f2d0cb22/attachment.html 


More information about the hibernate-issues mailing list