[hibernate-dev] [Search] Dynamic sharding configuration

Hardy Ferentschik hardy at hibernate.org
Mon Sep 23 08:58:13 EDT 2013


On 23 Jan 2013, at 1:55 PM, Sanne Grinovero <sanne at hibernate.org> wrote:

>> Or I set a custom sharding strategy which does not care about the number of shards?
> 
> I think that's far fetched. The NBR_OF_SHARDS option defines the size
> of the array of indexes passed to the IndexShardingStrategy so it's
> hard to ignore. Sure it's possible, we could throw a
> log.userIsAnIdiotException() but someone might not see the humor :-).
> Worst case it degenerates in a case similar to your example of
> "NotShardedStrategy and nbr_of_shards >1", or the user would notice
> with an ArrayOutOfBounds.. usercode problem.

I think I am thinking about a custom dynamic sharding strategy in this case.

>> IMO the important factor is to set the right sharding strategy and nbr_of_shards should just be a (optional) parameter to the sharding strategy.
> 
> Note that so far we don't expect users to explicitly set the
> NotShardedStrategy: it's simply a consequence of not having set any
> option; if the user sets only the number of shards but omits picking a
> specific strategy, we automatically assume he's going for the
> IdHashShardingStrategy

I get that, but as I said, I see it as a bit of misguided "ease of use", since it can introduce 
problems (see above) and is not consistent.

> As soon as a different IndexShardingStrategy is chosen, then I think
> it's quite self-explanatory that setting NBR_OF_SHARDS is quite
> useful: the user will have coded an explicit IndexShardingStrategy and
> consequentially have a clear idea of how many shards he wants, at
> least for the static sharding so far.

Right, for static sharding maybe, but we are going to make this dynamic now.

>> With dynamic sharding things get more complicated. Right now you configure dynamic sharding by setting 'nbr_of_shards' to the literal 'dynamic'. This selects under the hood the
>> right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on multiple levels. First 'dynamic' is not a number and secondly I want to configure a strategy
>> not the number of shards. It is also inconsistent with how we select/configure other pluggable components in Search. For that reason I suggest:
>> 
>> - The type of sharding is configured via setting hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter which gets passed to the strategy and which
>>   might get ignored depending on the sharding implementation. Implementations are free to check the property and e.g. print out a warning if the settings does not apply to them
> 
> Conceptually it sounds nice.
> I see two downsides:
> - it pushes complexity to the IndexShardingStrategy implementor (the
> user) as he needs to parse it and somehow he needs to request those
> indexes from the SearchFactory to be built. Pushing both these
> responsibilities to the end user in exchange for a one-liner in the
> configuration file seems like an odd choice? I would agree if it was
> us to write the code, but I really expect most people to plug their
> own strategy as IdHashShardingStrategy isn't very useful in a real
> world app.

I think that if the API to create an index manager is simple it basically unifies static
and dynamic sharding.

> - today we pre-initialize the indexes (IndexManagers) before they are
> passed to the IndexShardingStrategy # initialize method. We would need
> to pass instead some lifecycle-controlling objects which allows the
> user to trigger index initialization. Again I essentially agree but
> that sounds much like dynamic sharding?

With the right API both would be possible in a simple to understand way.

> I don't think we can change these in the scope of 4.4 as it affects
> current API. Shall we take this inconsistency point as
> yet-another-reason to migrate to Dynamic Sharding? While the new
> feature matures, I suspect it could completely replace the static one.

I think so as well, in which case we need to make sure that we get the API right. A
new/updated initialise contract might be exactly what we need instead of yet another patch.
See also my email regarding ShardIdentifierProvider.

>> - We introduce short names for the provided sharding strategies - 'none', 'id-hash', 'dynamic'. This will avoid the need to reference concrete implementation classes
> 
> -1 : as I reminded above, I don't expect id-hash to be of practical
> use, people want to plug their own strategies which implies we need
> the concrete implementation classes. I'd rather see the
> IdHashShardingStrategy as a concrete example we're providing (not just
> an example, I guess someone might find it useful in production, I just
> think it's a minority of the IndexShardingStrategy users).

IdHashShardingStrategy is in use right now, at least if you enable sharding without any other specific 
IndexShardingStrategy implementation. Providing a short name of it is inline with configuration options
like 'ram' or 'filesystem' for directory provides. We could use 'default' to hide the fact which impl
we are using. This way we could even replace the impl in case we find a better one.

>> - For dynamic sharding we have the additional sub-property 'shard_identity_provider' which specifies the ShardIdentifierProvider (new contract needed for dynamic sharding).
>>  This property is only relevant for dynamic sharding and will be handled in the same way as 'nbr_of_shards'
> 
> To recap today we have
>    hibernate.search.[indexName].sharding_strategy = [implementation
> of IndexShardingStrategy]
> 
> Would it not be nice if I could either specify an implementation of
> IndexShardingStrategy or a ShardIdentifierProvider ?
>    hibernate.search.[indexName].sharding_strategy = [implementation
> of IndexShardingStrategy | ShardIdentifierProvider]

hmm, I have not thought about it this way. So far I was more thinking along the
lines of removing ShardIdentifierProvider. But you are proposing to keep it
and maybe in the long run remove IndexShardingStrategy? 

> -  being the same property keeps it clear that you can either specify
> one OR the other.

Except that we are talking about to different interfaces. Hardly good practice to offer this type of
confguration.

--Hardy


More information about the hibernate-dev mailing list