| WARNING: We should probably at least investigate this before 6.0.0.Final, at least to be sure it won’t require API changes. Dynamic sharding is a bit tricky to implement in Search 6, because:
- it doesn't fit the concept of sharding in Elasticsearch very well (Elasticsearch only allows "static" sharding)
- as implemented in Search 5, it mixes the concept of entity being mapped and the resulting document (see org.hibernate.search.store.ShardIdentifierProvider#getShardIdentifier)
However, there are real use cases for "dynamic sharding". There are two reasons to want dynamic sharding:
- For performance. Sharding data according to a key extracted from business data, assigning one shard to each key, may improve performance. For example if we know that most searches will focus on a given language, then let's shard on the "language" key and have one index per language.
- To allow for slightly different configuration for each shard. In particular, we may want to index the same field with a different analyzer depending on the language (see also
HSEARCH-3311 Open ).
As far as I (Yoann) am concerned, combining "static", hash-based sharding with the solution mentioned in HSEARCH-3311 Open seems to do the trick and should be enough. It may be a bit less efficient (we'd need an additional "language = xxx" predicate to take into account the fact that multiple languages may end up in the same index), but would also be much simpler. Here is what we decided during the F2F meeting. To be revisited depending on the other priorities...
DECISION: we don’t need dynamic sharding at the backend level. However:
- We want this at the mapper level because it addresses real business needs, such as having one index per language.
- Let’s call it “dynamic partitioning”? “dynamic index mapping”? The idea would be not to confuse it with sharding at the backend level.
- We will have to require users to implement an SPI that returns a list of all indexes created so far (called on each query) and gets notified when we create an index.
- Targeting class “Book” will, by default, target all relevant index managers; applying a filter will remove some index managers from the target.
- We should probably investigate this before 6.0.0.Final, at least to be sure it won’t require API changes.
- While we’re at it, we should investigate mapping a single entity object to multiple documents in different indexes. What does it mean when querying in particular? This could wait, though.
|