[hibernate-dev] [HSEARCH] Dynamic Sharding and directory template

Emmanuel Bernard emmanuel at hibernate.org
Fri Apr 12 08:32:23 EDT 2013


Hiding behind a long and deep email won't hide the fact that you have not
answered my question :)

More inline.

On Thu 2013-04-11 17:47, Sanne Grinovero wrote:
> Man your simple question is actually super complex.
> 
> Conclusion first: I think it's important we can always identify any
> index just with a simple String, but you're very welcome to add some
> kind of register indexName -> StuffWeKnowAboutIt.
> 
> This has been biting in several forms. Let's recap the different aspects:
> 
> In the left corner of the ring we have this user friendly element:
> 
>     @Indexed(index="customers")
>     class Customer extends Person {
>     ...
> 
>     @Indexed(index="person")
>     class Person {
>     ...
> 
> And I think we all will agree that this simplicity has to stay.
> 
> For each index name we map directly configuration properties, but
> allow overrides in the more specific section:
> 
> hibernate.search.default.indexwriter.max_merge_docs=12
> hibernate.search.customers.locking_strategy = native
> hibernate.search.customers.sharding_strategy.nbr_of_shards = 4
> hibernate.search.cusomters.1.locking_strategy = none
> hibernate.search.customers.2.locking_strategy = simple
> 
> 1# Problem: mis-typed properties
> First problem is the line 3 above is going to be ignored: having a
> typo in the index name, the user will not even see a warning as we
> fail to inspect the option.
> For example since I'm using an international layout keyboard, I often
> mistakenly insert an invisible character with the double quotes..
> total hell to figure out why behavior is ignoring my configuration.
> 
> 2# Problem: Unnatural sharding support
> This is valid today:
> 
>     @Indexed(index="customers.3")
>     class EnglishCustomer extends Customer {
>     ...
> 
> The above is going to automatically store all EnglishCustomer
> instances in the 3rd shard - together with other Customer instances if
> the strategy allows. I'm not endorsing it, but it does allow for some
> interesting flexibility.
> 
> Q: Would we be ok in suddenly considering this illegal?
> 
> As I don't think this could still work if we start considering the
> identifier of the index as only a sub-part of the string above.
> 
> 
> One aspect of the problem is that in Infinispan Query we have to boot
> the SearchFactory without knowing all the indexed types: there is no
> classpath scanning (nor there is in the foreseeable future).
> 
> So let's say the engine receives the following domain object - never
> seen before:
> 
>     @Indexed(index="starships.gamma7")
>     class DiskShapedUFO extends Classified {
>     ....
> 
> we'll be looking for specific properties to boot - among others - the
> DirectoryProvider:
> 
> hibernate.search.default.directory_provider = filesystem
> [ hibernate.search.starships.gamma7.directory_provider ?undefined? ]
> 
> Currently we'd pick FSDirectory but maybe this existed:
> hibernate.search.starships.directory_provider = infinispan
> 
> so we would be running on the wrong index, which is not acceptable.
> 
> Seems like we would need to drop support for such a sharding option,
> or give a very specific meaning to the "." dot character for
> sub-shards identification so that the we could infer the need to look
> for a "starships" index configuration.

Maybe you have situations where directly mapping an entity's index to a
specific shard of another entity's index makes sense but it seems to me
that we should not support / allow such situation.

> 
> Let's not even try to tackle the complexities arisen by shard names such as
>     @Indexed(index="home.worker")
> as worker is a prefix for one of the follow-up options.
> 
> So considering these tricky aspects I agree we need a better way to
> group all configuration properties for a specific index, but also you
> ultimately need to be able to identify an index using the simple name.

It seems that dot as a separator was a poor choice and that we should
depart from it. Something like

    hibernate.search.starship[gamma].directory_provider ram

Seems clearer and provide less ambiguity. Of course that does not
prevent someone from having used such approach but at least we can throw
an exception from now one if we find [] in an @Index.name.

If we move to the [] approach, we need to provide a migration path to
users though.

> 
> For HSEARCH-1295 I would expect a register of properties using the
> index name as key, something very similar to the IndexManagerHolder ?

Not sure I follow you. My pull-request code already does that no?

> When we created the IndexManager the intention really was to group the
> related services for a specific index; today sharding works in front
> of the IndexManager but we could consider sharding also to be a tree
> of IndexManager(s),
> where the root instance holds on common strategies and options among
> the shard, especially to be able to boot new ones as needed.

We can but what's the benefit? At the end of the day the ability to
select a specific IM shard and associated reader sub selection is what's
really important.

> 
> If we could aggressively scan all property keys and from there
> pre-construct the configuration metadata for each index, this would
> pave the road for validation of the configuration;
> We would need to recognize shard-index names and pre build a tree for
> the shards, then it would be easy for the DirectoryProvider to figure
> out it's parent options and lookup an associated sharding strategy.

I am not sure I entirely follow but I imagine what you are saying is
that you want to break all the initialize() methods and pass a specific
configuration object instead of the Properties object.
That would provide some more info indeed and it would be nice to have
such object to add type-safe information later without breaking the API
but you still need this configuration object to pass the Properties as
you never know what the DirectoryProvider implementation needs for its
configuration.

> 
> How does this looks like? :
> 
> hibernate.search.users.shards_pathmaker = from_template    # we give
> some out of the box ways to handle FS path name generations

What's the alternative to from_template?

> hibernate.search.users.shards_pathmaker.template =
> /var/data/index/{shard}/User  # which might have some options
> hibernate.search.users.sharding_strategy = my.custom.dynamic.strategy
>      # Activate sharding
> hibernate.search.users.FR.locking_strategy = native    # Some override as usual
> 
> The tricky aspect is to recognize the last line as an element of the
> users index, and as such that we need to apply the template for the
> FSDirectory path.

I think the [FR] syntax will help.


More information about the hibernate-dev mailing list