[hibernate-dev] [HSEARCH] Dynamic Sharding and directory template

Thu Apr 11 12:47:37 EDT 2013

Man your simple question is actually super complex.

Conclusion first: I think it's important we can always identify any
index just with a simple String, but you're very welcome to add some
kind of register indexName -> StuffWeKnowAboutIt.

This has been biting in several forms. Let's recap the different aspects:

In the left corner of the ring we have this user friendly element:

    @Indexed(index="customers")
    class Customer extends Person {
    ...

    @Indexed(index="person")
    class Person {
    ...

And I think we all will agree that this simplicity has to stay.

For each index name we map directly configuration properties, but
allow overrides in the more specific section:

hibernate.search.default.indexwriter.max_merge_docs=12
hibernate.search.customers.locking_strategy = native
hibernate.search.customers.sharding_strategy.nbr_of_shards = 4
hibernate.search.cusomters.1.locking_strategy = none
hibernate.search.customers.2.locking_strategy = simple

1# Problem: mis-typed properties
First problem is the line 3 above is going to be ignored: having a
typo in the index name, the user will not even see a warning as we
fail to inspect the option.
For example since I'm using an international layout keyboard, I often
mistakenly insert an invisible character with the double quotes..
total hell to figure out why behavior is ignoring my configuration.

2# Problem: Unnatural sharding support
This is valid today:

    @Indexed(index="customers.3")
    class EnglishCustomer extends Customer {
    ...

The above is going to automatically store all EnglishCustomer
instances in the 3rd shard - together with other Customer instances if
the strategy allows. I'm not endorsing it, but it does allow for some
interesting flexibility.

Q: Would we be ok in suddenly considering this illegal?

As I don't think this could still work if we start considering the
identifier of the index as only a sub-part of the string above.

One aspect of the problem is that in Infinispan Query we have to boot
the SearchFactory without knowing all the indexed types: there is no
classpath scanning (nor there is in the foreseeable future).

So let's say the engine receives the following domain object - never
seen before:

    @Indexed(index="starships.gamma7")
    class DiskShapedUFO extends Classified {
    ....

we'll be looking for specific properties to boot - among others - the
DirectoryProvider:

hibernate.search.default.directory_provider = filesystem
[ hibernate.search.starships.gamma7.directory_provider ?undefined? ]

Currently we'd pick FSDirectory but maybe this existed:
hibernate.search.starships.directory_provider = infinispan

so we would be running on the wrong index, which is not acceptable.

Seems like we would need to drop support for such a sharding option,
or give a very specific meaning to the "." dot character for
sub-shards identification so that the we could infer the need to look
for a "starships" index configuration.

Let's not even try to tackle the complexities arisen by shard names such as
    @Indexed(index="home.worker")
as worker is a prefix for one of the follow-up options.

So considering these tricky aspects I agree we need a better way to
group all configuration properties for a specific index, but also you
ultimately need to be able to identify an index using the simple name.

For HSEARCH-1295 I would expect a register of properties using the
index name as key, something very similar to the IndexManagerHolder ?

When we created the IndexManager the intention really was to group the
related services for a specific index; today sharding works in front
of the IndexManager but we could consider sharding also to be a tree
of IndexManager(s),
where the root instance holds on common strategies and options among
the shard, especially to be able to boot new ones as needed.

If we could aggressively scan all property keys and from there
pre-construct the configuration metadata for each index, this would
pave the road for validation of the configuration;
We would need to recognize shard-index names and pre build a tree for
the shards, then it would be easy for the DirectoryProvider to figure
out it's parent options and lookup an associated sharding strategy.

How does this looks like? :

hibernate.search.users.shards_pathmaker = from_template    # we give
some out of the box ways to handle FS path name generations
hibernate.search.users.shards_pathmaker.template =
/var/data/index/{shard}/User  # which might have some options
hibernate.search.users.sharding_strategy = my.custom.dynamic.strategy
     # Activate sharding
hibernate.search.users.FR.locking_strategy = native    # Some override as usual

The tricky aspect is to recognize the last line as an element of the
users index, and as such that we need to apply the template for the
FSDirectory path.

Cheers,
Sanne

On 11 April 2013 14:12, Nicolas Helleringer
<nicolas.helleringer at gmail.com> wrote:
> IMHO passing the shard identifier in the Properties entries is a weak
> solution in long term.
>
> I shall prefer breaking SPI but no rational thoughts to back my out of
> the box opinion.
>
> Niko
>
> 2013/4/11 Emmanuel Bernard <emmanuel at hibernate.org>:
>> I am currently working on a solution for dynamically adding new shards
>> to Hibernate Search (for example one per tenant with a list growing).
>>
>> https://hibernate.atlassian.net/browse/HSEARCH-472
>>
>> Things are going well but there is an interesting problem related to a
>> subsequent feature
>>
>> https://hibernate.atlassian.net/browse/HSEARCH-1295
>>
>> In short, EntityIndexBinders create additional IndexManagers when a not
>> yet created shard id requested. The IndexManager is uniquely identified
>> by its indexName.
>>
>> In the old sharding approach, the index name was turned into indexName.n
>> (n being the shard number) and this new indexName.n is passed along to
>> DirectoryProviders etc
>>
>> To implement HSEARCH-1295 properly, you need the DirectoryProvider to
>> have access to the original index name and the shard identifier as
>> independent dataset.
>>
>> We can hack around the model and pass the original indexName and shard
>> identifier in specific Properties entries. That's backward compatible.
>>
>> An alternative is to replace String indexName in all thee contracts with
>> a proper IndexName object pointing to the original indexName and to the
>> shard idenfitier. That one breaks a bunch of SPI and in particular
>> DirectoryProvider.
>>
>> Thoughts?
>>
>> Emmanuel
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev