[hibernate-issues] [JIRA] (HSEARCH-3313) Add support for dynamic index "partitioning"

Yoann Rodière (JIRA) jira at hibernate.atlassian.net
Mon Jul 27 06:57:02 EDT 2020


Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b ) *updated* an issue

Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 ) / New Feature ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 ) HSEARCH-3313 ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 ) Add support for dynamic index "partitioning" ( https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 )

Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3A58fa1ced-171a-4c00-97e8-5d70d442cc4b )

*WARNING:* See HSEARCH-3683 / HSEARCH-3971 before trying to implement this; it's possible that HSEARCH-3683 / HSEARCH-3971 will lay the groundwork for dynamic index partitioning.

*WARNING*: We should probably at least investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.

Dynamic sharding is a bit tricky to implement in Search 6, because:

# it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch only allows "static" sharding)
# as implemented in Search 5, it mixes the concept of entity being mapped and the resulting document (see {{org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier}})

However, there are real use cases for "dynamic sharding". There are two reasons to want dynamic sharding:

# For performance. Sharding data according to a key extracted from business data, assigning one shard to each key, may improve performance. For example if we know that most searches will focus on a given language, then let's shard on the "language" key and have one index per language.
# To allow for slightly different configuration for each shard. In particular, we may want to index the same field with a different analyzer depending on the language (see also HSEARCH-3311).

As far as I (Yoann) am concerned, combining "static", hash-based sharding with the solution mentioned in HSEARCH-3311 seems to do the trick and should be enough. It may be a bit less efficient (we'd need an additional "language = xxx" predicate to take into account the fact that multiple languages may end up in the same index, see [how it's usually done with Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html#search-routing]), but would also be much simpler.

Here is what we decided during the F2F meeting. To be revisited depending on the other priorities...

{quote}
DECISION: we don’t need dynamic sharding at the backend level. However:
# We want this at the mapper level because it addresses real business needs, such as having one index per language.
# Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to confuse it with sharding at the backend level.
# We will have to require users to implement an SPI that returns a list of all indexes created so far (called on each query) and gets notified when we create an index.
# Targeting class “Book” will, by default, target all relevant index managers; applying a filter will remove some index managers from the target.
# We should probably investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
# While we’re at it, we should investigate mapping a single entity object to multiple documents in different indexes. What does it mean when querying in particular? This could wait, though.
{quote}

( https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin=eyJpIjoiNDUyZDk2OWU3N2IyNDBlYzg2YjJkNTIxMGEwMmQ1Y2MiLCJwIjoiaiJ9 )

Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.core&referrer=utm_source%3DNotificationLink%26utm_medium%3DEmail ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNotificationLink&mt=8 ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-issues/attachments/20200727/1c2a9c7e/attachment.html 


More information about the hibernate-issues mailing list