On 23 Jan 2013, at 1:55 PM, Sanne Grinovero <sanne(a)hibernate.org> wrote:
> Or I set a custom sharding strategy which does not care about the
number of shards?
I think that's far fetched. The NBR_OF_SHARDS option defines the size
of the array of indexes passed to the IndexShardingStrategy so it's
hard to ignore. Sure it's possible, we could throw a
log.userIsAnIdiotException() but someone might not see the humor :-).
Worst case it degenerates in a case similar to your example of
"NotShardedStrategy and nbr_of_shards >1", or the user would notice
with an ArrayOutOfBounds.. usercode problem.
I think I am thinking about a custom dynamic sharding strategy in this case.
> IMO the important factor is to set the right sharding strategy
and nbr_of_shards should just be a (optional) parameter to the sharding strategy.
Note that so far we don't expect users to explicitly set the
NotShardedStrategy: it's simply a consequence of not having set any
option; if the user sets only the number of shards but omits picking a
specific strategy, we automatically assume he's going for the
IdHashShardingStrategy
I get that, but as I said, I see it as a bit of misguided "ease of use", since
it can introduce
problems (see above) and is not consistent.
As soon as a different IndexShardingStrategy is chosen, then I think
it's quite self-explanatory that setting NBR_OF_SHARDS is quite
useful: the user will have coded an explicit IndexShardingStrategy and
consequentially have a clear idea of how many shards he wants, at
least for the static sharding so far.
Right, for static sharding maybe, but we are going to make this dynamic now.
> With dynamic sharding things get more complicated. Right now you
configure dynamic sharding by setting 'nbr_of_shards' to the literal
'dynamic'. This selects under the hood the
> right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on
multiple levels. First 'dynamic' is not a number and secondly I want to configure
a strategy
> not the number of shards. It is also inconsistent with how we select/configure other
pluggable components in Search. For that reason I suggest:
>
> - The type of sharding is configured via setting
hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter
which gets passed to the strategy and which
> might get ignored depending on the sharding implementation. Implementations are
free to check the property and e.g. print out a warning if the settings does not apply to
them
Conceptually it sounds nice.
I see two downsides:
- it pushes complexity to the IndexShardingStrategy implementor (the
user) as he needs to parse it and somehow he needs to request those
indexes from the SearchFactory to be built. Pushing both these
responsibilities to the end user in exchange for a one-liner in the
configuration file seems like an odd choice? I would agree if it was
us to write the code, but I really expect most people to plug their
own strategy as IdHashShardingStrategy isn't very useful in a real
world app.
I think that if the API to create an index manager is simple it basically unifies static
and dynamic sharding.
- today we pre-initialize the indexes (IndexManagers) before they
are
passed to the IndexShardingStrategy # initialize method. We would need
to pass instead some lifecycle-controlling objects which allows the
user to trigger index initialization. Again I essentially agree but
that sounds much like dynamic sharding?
With the right API both would be possible in a simple to understand way.
I don't think we can change these in the scope of 4.4 as it
affects
current API. Shall we take this inconsistency point as
yet-another-reason to migrate to Dynamic Sharding? While the new
feature matures, I suspect it could completely replace the static one.
I think so as well, in which case we need to make sure that we get the API right. A
new/updated initialise contract might be exactly what we need instead of yet another
patch.
See also my email regarding ShardIdentifierProvider.
> - We introduce short names for the provided sharding strategies -
'none', 'id-hash', 'dynamic'. This will avoid the need to
reference concrete implementation classes
-1 : as I reminded above, I don't expect id-hash to be of practical
use, people want to plug their own strategies which implies we need
the concrete implementation classes. I'd rather see the
IdHashShardingStrategy as a concrete example we're providing (not just
an example, I guess someone might find it useful in production, I just
think it's a minority of the IndexShardingStrategy users).
IdHashShardingStrategy is in use right now, at least if you enable sharding without any
other specific
IndexShardingStrategy implementation. Providing a short name of it is inline with
configuration options
like 'ram' or 'filesystem' for directory provides. We could use
'default' to hide the fact which impl
we are using. This way we could even replace the impl in case we find a better one.
> - For dynamic sharding we have the additional sub-property
'shard_identity_provider' which specifies the ShardIdentifierProvider (new
contract needed for dynamic sharding).
> This property is only relevant for dynamic sharding and will be handled in the same
way as 'nbr_of_shards'
To recap today we have
hibernate.search.[indexName].sharding_strategy = [implementation
of IndexShardingStrategy]
Would it not be nice if I could either specify an implementation of
IndexShardingStrategy or a ShardIdentifierProvider ?
hibernate.search.[indexName].sharding_strategy = [implementation
of IndexShardingStrategy | ShardIdentifierProvider]
hmm, I have not thought about it this way. So far I was more thinking along the
lines of removing ShardIdentifierProvider. But you are proposing to keep it
and maybe in the long run remove IndexShardingStrategy?
- being the same property keeps it clear that you can either
specify
one OR the other.
Except that we are talking about to different interfaces. Hardly good practice to offer
this type of
confguration.
--Hardy