Hi,
I am currently working on HSEARCH-471 [1] - dynamic sharding. The work is built on
Emmanuel's prototype and you find the current code on my fork [2].
Right now I am wondering about how to configure (dynamic) sharding. Here is how things
worked prior to dynamic sharding. Basically there two properties
driving the shard configuration:
- hibernate.search.[indexName].sharding_strategy
- hibernate.search.[indexName].sharding_strategy.nbr_of_shards
The first property determines the implementation class of IndexShardingStrategy and the
second the number of shards to create. So far we had two implementations
of IndexShardingStrategy, namely NotShardedStrategy and IdHashShardingStrategy.
To configure sharding it was enough to set nbr_of_shards to a value > 1. This would
automatically select IdHashShardingStrategy and shard depending on the configured
number of shards. The idea was to make it simple to for the user and only require a single
configuration change to enable sharding.
However, it creates inconsistencies. For example what if I select NotShardedStrategy and
nbr_of_shards >1?
Or I set a custom sharding strategy which does not care about the number of shards?
IMO the important factor is to set the right sharding strategy and nbr_of_shards should
just be a (optional) parameter to the sharding strategy.
With dynamic sharding things get more complicated. Right now you configure dynamic
sharding by setting 'nbr_of_shards' to the literal 'dynamic'. This selects
under the hood the
right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on multiple
levels. First 'dynamic' is not a number and secondly I want to configure a
strategy
not the number of shards. It is also inconsistent with how we select/configure other
pluggable components in Search. For that reason I suggest:
- The type of sharding is configured via setting
hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter
which gets passed to the strategy and which
might get ignored depending on the sharding implementation. Implementations are free to
check the property and e.g. print out a warning if the settings does not apply to them
- We introduce short names for the provided sharding strategies - 'none',
'id-hash', 'dynamic'. This will avoid the need to reference concrete
implementation classes
- For dynamic sharding we have the additional sub-property
'shard_identity_provider' which specifies the ShardIdentifierProvider (new
contract needed for dynamic sharding).
This property is only relevant for dynamic sharding and will be handled in the same way
as 'nbr_of_shards'
Thoughts?
--Hardy
[1]
https://hibernate.atlassian.net/browse/HSEARCH-472
[2]
https://github.com/hferentschik/hibernate-search/compare/HSEARCH-472