Message Title

Change By:	Yoann Rodière

*WARNING:* See HSEARCH-3683 / HSEARCH-3971 before trying to implement this; it's possible that HSEARCH-3683 / HSEARCH-3971 will lay the groundwork for dynamic index partitioning.

*WARNING*: We should probably at least investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.

Dynamic sharding is a bit tricky to implement in Search 6, because:

# it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch only allows "static" sharding)
# as implemented in Search 5, it mixes the concept of entity being mapped and the resulting document (see {{org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier}})

However, there are real use cases for "dynamic sharding". There are two reasons to want dynamic sharding:

# For performance. Sharding data according to a key extracted from business data, assigning one shard to each key, may improve performance. For example if we know that most searches will focus on a given language, then let's shard on the "language" key and have one index per language.
# To allow for slightly different configuration for each shard. In particular, we may want to index the same field with a different analyzer depending on the language (see also HSEARCH-3311).

As far as I (Yoann) am concerned, combining "static", hash-based sharding with the solution mentioned in HSEARCH-3311 seems to do the trick and should be enough. It may be a bit less efficient (we'd need an additional "language = xxx" predicate to take into account the fact that multiple languages may end up in the same index, see [how it's usually done with Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html#search-routing]), but would also be much simpler.

Here is what we decided during the F2F meeting. To be revisited depending on the other priorities...

{quote}
DECISION: we don’t need dynamic sharding at the backend level. However:
# We want this at the mapper level because it addresses real business needs, such as having one index per language.
# Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to confuse it with sharding at the backend level.
# We will have to require users to implement an SPI that returns a list of all indexes created so far (called on each query) and gets notified when we create an index.
# Targeting class “Book” will, by default, target all relevant index managers; applying a filter will remove some index managers from the target.
# We should probably investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
# While we’re at it, we should investigate mapping a single entity object to multiple documents in different indexes. What does it mean when querying in particular? This could wait, though.
{quote}

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira