*WARNING:* See HSEARCH-3683 / HSEARCH-3971 before trying to implement this; it's possible that HSEARCH-3683 / HSEARCH-3971 will lay the groundwork for dynamic index partitioning.
*WARNING*: We should probably at least investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
Dynamic sharding is a bit tricky to implement in Search 6, because:
# it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch only allows "static" sharding) # as implemented in Search 5, it mixes the concept of entity being mapped and the resulting document (see {{org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier}})
However, there are real use cases for "dynamic sharding". There are two reasons to want dynamic sharding:
# For performance. Sharding data according to a key extracted from business data, assigning one shard to each key, may improve performance. For example if we know that most searches will focus on a given language, then let's shard on the "language" key and have one index per language. # To allow for slightly different configuration for each shard. In particular, we may want to index the same field with a different analyzer depending on the language (see also HSEARCH-3311).
As far as I (Yoann) am concerned, combining "static", hash-based sharding with the solution mentioned in HSEARCH-3311 seems to do the trick and should be enough. It may be a bit less efficient (we'd need an additional "language = xxx" predicate to take into account the fact that multiple languages may end up in the same index, see [how it's usually done with Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html#search-routing]), but would also be much simpler.
Here is what we decided during the F2F meeting. To be revisited depending on the other priorities...
{quote} DECISION: we don’t need dynamic sharding at the backend level. However: # We want this at the mapper level because it addresses real business needs, such as having one index per language. # Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to confuse it with sharding at the backend level. # We will have to require users to implement an SPI that returns a list of all indexes created so far (called on each query) and gets notified when we create an index. # Targeting class “Book” will, by default, target all relevant index managers; applying a filter will remove some index managers from the target. # We should probably investigate this before 6.0.0.Final, at least to be sure it won’t require API changes. # While we’re at it, we should investigate mapping a single entity object to multiple documents in different indexes. What does it mean when querying in particular? This could wait, though. {quote} |
|