[hibernate-dev] [Search] Dynamic sharding configuration

Sanne Grinovero sanne at hibernate.org
Mon Sep 23 07:55:00 EDT 2013


On 20 September 2013 11:37, Hardy Ferentschik <hardy at hibernate.org> wrote:
> Hi,
>
> I am currently working on HSEARCH-471 [1] - dynamic sharding. The work is built on Emmanuel's prototype and you find the current code on my fork [2].
>
> Right now I am wondering about how to configure (dynamic) sharding. Here is how things worked prior to dynamic sharding. Basically there two properties
> driving the shard configuration:
>
> - hibernate.search.[indexName].sharding_strategy
> - hibernate.search.[indexName].sharding_strategy.nbr_of_shards
>
> The first property determines the implementation class of IndexShardingStrategy and the second the number of shards to create. So far we had two implementations
> of IndexShardingStrategy, namely NotShardedStrategy and IdHashShardingStrategy.

As a reminder to other readers: those are the two implementations
included in Hibernate Search but the general expectation is that you
plug in your own IndexShardingStrategy to make most of it.

> To configure sharding it was enough to set nbr_of_shards to a value > 1. This would automatically select IdHashShardingStrategy and shard depending on the configured
> number of shards. The idea was to make it simple to for the user and only require a single configuration change to enable sharding.
>
> However, it creates inconsistencies. For example what if I select NotShardedStrategy and nbr_of_shards >1?

+1, should throw an exception. Currently it will initialize multiple
shards but just use the first one; not too bad.

> Or I set a custom sharding strategy which does not care about the number of shards?

I think that's far fetched. The NBR_OF_SHARDS option defines the size
of the array of indexes passed to the IndexShardingStrategy so it's
hard to ignore. Sure it's possible, we could throw a
log.userIsAnIdiotException() but someone might not see the humor :-).
Worst case it degenerates in a case similar to your example of
"NotShardedStrategy and nbr_of_shards >1", or the user would notice
with an ArrayOutOfBounds.. usercode problem.

> IMO the important factor is to set the right sharding strategy and nbr_of_shards should just be a (optional) parameter to the sharding strategy.

Note that so far we don't expect users to explicitly set the
NotShardedStrategy: it's simply a consequence of not having set any
option; if the user sets only the number of shards but omits picking a
specific strategy, we automatically assume he's going for the
IdHashShardingStrategy.
As soon as a different IndexShardingStrategy is chosen, then I think
it's quite self-explanatory that setting NBR_OF_SHARDS is quite
useful: the user will have coded an explicit IndexShardingStrategy and
consequentially have a clear idea of how many shards he wants, at
least for the static sharding so far.

> With dynamic sharding things get more complicated. Right now you configure dynamic sharding by setting 'nbr_of_shards' to the literal 'dynamic'. This selects under the hood the
> right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on multiple levels. First 'dynamic' is not a number and secondly I want to configure a strategy
> not the number of shards. It is also inconsistent with how we select/configure other pluggable components in Search. For that reason I suggest:
>
> - The type of sharding is configured via setting hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter which gets passed to the strategy and which
>    might get ignored depending on the sharding implementation. Implementations are free to check the property and e.g. print out a warning if the settings does not apply to them

Conceptually it sounds nice.
I see two downsides:
 - it pushes complexity to the IndexShardingStrategy implementor (the
user) as he needs to parse it and somehow he needs to request those
indexes from the SearchFactory to be built. Pushing both these
responsibilities to the end user in exchange for a one-liner in the
configuration file seems like an odd choice? I would agree if it was
us to write the code, but I really expect most people to plug their
own strategy as IdHashShardingStrategy isn't very useful in a real
world app.
 - today we pre-initialize the indexes (IndexManagers) before they are
passed to the IndexShardingStrategy # initialize method. We would need
to pass instead some lifecycle-controlling objects which allows the
user to trigger index initialization. Again I essentially agree but
that sounds much like dynamic sharding?

I don't think we can change these in the scope of 4.4 as it affects
current API. Shall we take this inconsistency point as
yet-another-reason to migrate to Dynamic Sharding? While the new
feature matures, I suspect it could completely replace the static one.
Let's see gradually?

> - We introduce short names for the provided sharding strategies - 'none', 'id-hash', 'dynamic'. This will avoid the need to reference concrete implementation classes

-1 : as I reminded above, I don't expect id-hash to be of practical
use, people want to plug their own strategies which implies we need
the concrete implementation classes. I'd rather see the
IdHashShardingStrategy as a concrete example we're providing (not just
an example, I guess someone might find it useful in production, I just
think it's a minority of the IndexShardingStrategy users).

> - For dynamic sharding we have the additional sub-property 'shard_identity_provider' which specifies the ShardIdentifierProvider (new contract needed for dynamic sharding).
>   This property is only relevant for dynamic sharding and will be handled in the same way as 'nbr_of_shards'

To recap today we have
    hibernate.search.[indexName].sharding_strategy = [implementation
of IndexShardingStrategy]

Would it not be nice if I could either specify an implementation of
IndexShardingStrategy or a ShardIdentifierProvider ?
    hibernate.search.[indexName].sharding_strategy = [implementation
of IndexShardingStrategy | ShardIdentifierProvider]

In case a ShardIdentifierProvider is passed, it's obviously dynamic.
In this case specifying a property for .nbr_of_shards would be ignored
with a warning. The good thing of using this same property is that we
can
-  gradually migrate from the static sharding without changing the props
-  being the same property keeps it clear that you can either specify
one OR the other.

-- Sanne

>
> Thoughts?
>
> --Hardy
>
>
>
> [1] https://hibernate.atlassian.net/browse/HSEARCH-472
> [2] https://github.com/hferentschik/hibernate-search/compare/HSEARCH-472
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev


More information about the hibernate-dev mailing list