[hibernate-dev] [Search] Dynamic sharding configuration
Hardy Ferentschik
hardy at hibernate.org
Fri Sep 20 06:37:45 EDT 2013
Hi,
I am currently working on HSEARCH-471 [1] - dynamic sharding. The work is built on Emmanuel's prototype and you find the current code on my fork [2].
Right now I am wondering about how to configure (dynamic) sharding. Here is how things worked prior to dynamic sharding. Basically there two properties
driving the shard configuration:
- hibernate.search.[indexName].sharding_strategy
- hibernate.search.[indexName].sharding_strategy.nbr_of_shards
The first property determines the implementation class of IndexShardingStrategy and the second the number of shards to create. So far we had two implementations
of IndexShardingStrategy, namely NotShardedStrategy and IdHashShardingStrategy.
To configure sharding it was enough to set nbr_of_shards to a value > 1. This would automatically select IdHashShardingStrategy and shard depending on the configured
number of shards. The idea was to make it simple to for the user and only require a single configuration change to enable sharding.
However, it creates inconsistencies. For example what if I select NotShardedStrategy and nbr_of_shards >1?
Or I set a custom sharding strategy which does not care about the number of shards?
IMO the important factor is to set the right sharding strategy and nbr_of_shards should just be a (optional) parameter to the sharding strategy.
With dynamic sharding things get more complicated. Right now you configure dynamic sharding by setting 'nbr_of_shards' to the literal 'dynamic'. This selects under the hood the
right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on multiple levels. First 'dynamic' is not a number and secondly I want to configure a strategy
not the number of shards. It is also inconsistent with how we select/configure other pluggable components in Search. For that reason I suggest:
- The type of sharding is configured via setting hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter which gets passed to the strategy and which
might get ignored depending on the sharding implementation. Implementations are free to check the property and e.g. print out a warning if the settings does not apply to them
- We introduce short names for the provided sharding strategies - 'none', 'id-hash', 'dynamic'. This will avoid the need to reference concrete implementation classes
- For dynamic sharding we have the additional sub-property 'shard_identity_provider' which specifies the ShardIdentifierProvider (new contract needed for dynamic sharding).
This property is only relevant for dynamic sharding and will be handled in the same way as 'nbr_of_shards'
Thoughts?
--Hardy
[1] https://hibernate.atlassian.net/browse/HSEARCH-472
[2] https://github.com/hferentschik/hibernate-search/compare/HSEARCH-472
More information about the hibernate-dev
mailing list