[hibernate-issues] [JIRA] (HSEARCH-3313) Add support for dynamic index "partitioning"

Monday, 27 July 2020

Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
) *updated* an issue

Hibernate Search (
https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiNDUyZDk2...
) / New Feature (
https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU...
) HSEARCH-3313 (
https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU...
) Add support for dynamic index "partitioning" (
https://hibernate.atlassian.net/browse/HSEARCH-3313?atlOrigin=eyJpIjoiNDU...
)

Change By: Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
)

*WARNING:* See HSEARCH-3683 / HSEARCH-3971 before trying to implement this; it's
possible that HSEARCH-3683 / HSEARCH-3971 will lay the groundwork for dynamic index
partitioning.

*WARNING*: We should probably at least investigate this before 6.0.0.Final, at least to be
sure it won’t require API changes.

Dynamic sharding is a bit tricky to implement in Search 6, because:

# it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch
only allows "static" sharding)
# as implemented in Search 5, it mixes the concept of entity being mapped and the
resulting document (see
{{org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier}})

However, there are real use cases for "dynamic sharding". There are two reasons
to want dynamic sharding:

# For performance. Sharding data according to a key extracted from business data,
assigning one shard to each key, may improve performance. For example if we know that most
searches will focus on a given language, then let's shard on the "language"
key and have one index per language.
# To allow for slightly different configuration for each shard. In particular, we may want
to index the same field with a different analyzer depending on the language (see also
HSEARCH-3311).

As far as I (Yoann) am concerned, combining "static", hash-based sharding with
the solution mentioned in HSEARCH-3311 seems to do the trick and should be enough. It may
be a bit less efficient (we'd need an additional "language = xxx" predicate
to take into account the fact that multiple languages may end up in the same index, see
[how it's usually done with
Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html#search-routing]),
but would also be much simpler.

Here is what we decided during the F2F meeting. To be revisited depending on the other
priorities...

{quote}
DECISION: we don’t need dynamic sharding at the backend level. However:
# We want this at the mapper level because it addresses real business needs, such as
having one index per language.
# Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to
confuse it with sharding at the backend level.
# We will have to require users to implement an SPI that returns a list of all indexes
created so far (called on each query) and gets notified when we create an index.
# Targeting class “Book” will, by default, target all relevant index managers; applying a
filter will remove some index managers from the target.
# We should probably investigate this before 6.0.0.Final, at least to be sure it won’t
require API changes.
# While we’re at it, we should investigate mapping a single entity object to multiple
documents in different indexes. What does it mean when querying in particular? This could
wait, though.
{quote}

(
https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin...
) Add Comment (
https://hibernate.atlassian.net/browse/HSEARCH-3313#add-comment?atlOrigin...
)

Get Jira notifications on your phone! Download the Jira Cloud app for Android (
https://play.google.com/store/apps/details?id=com.atlassian.android.jira....
) or iOS (
https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=Em...
) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100140- sha1:fbb0377 )

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [JIRA] (HSEARCH-3313) Add support for dynamic index "partitioning"