Hiding behind a long and deep email won't hide the fact that you have not
answered my question :)
More inline.
On Thu 2013-04-11 17:47, Sanne Grinovero wrote:
Man your simple question is actually super complex.
Conclusion first: I think it's important we can always identify any
index just with a simple String, but you're very welcome to add some
kind of register indexName -> StuffWeKnowAboutIt.
This has been biting in several forms. Let's recap the different aspects:
In the left corner of the ring we have this user friendly element:
@Indexed(index="customers")
class Customer extends Person {
...
@Indexed(index="person")
class Person {
...
And I think we all will agree that this simplicity has to stay.
For each index name we map directly configuration properties, but
allow overrides in the more specific section:
hibernate.search.default.indexwriter.max_merge_docs=12
hibernate.search.customers.locking_strategy = native
hibernate.search.customers.sharding_strategy.nbr_of_shards = 4
hibernate.search.cusomters.1.locking_strategy = none
hibernate.search.customers.2.locking_strategy = simple
1# Problem: mis-typed properties
First problem is the line 3 above is going to be ignored: having a
typo in the index name, the user will not even see a warning as we
fail to inspect the option.
For example since I'm using an international layout keyboard, I often
mistakenly insert an invisible character with the double quotes..
total hell to figure out why behavior is ignoring my configuration.
2# Problem: Unnatural sharding support
This is valid today:
@Indexed(index="customers.3")
class EnglishCustomer extends Customer {
...
The above is going to automatically store all EnglishCustomer
instances in the 3rd shard - together with other Customer instances if
the strategy allows. I'm not endorsing it, but it does allow for some
interesting flexibility.
Q: Would we be ok in suddenly considering this illegal?
As I don't think this could still work if we start considering the
identifier of the index as only a sub-part of the string above.
One aspect of the problem is that in Infinispan Query we have to boot
the SearchFactory without knowing all the indexed types: there is no
classpath scanning (nor there is in the foreseeable future).
So let's say the engine receives the following domain object - never
seen before:
@Indexed(index="starships.gamma7")
class DiskShapedUFO extends Classified {
....
we'll be looking for specific properties to boot - among others - the
DirectoryProvider:
hibernate.search.default.directory_provider = filesystem
[ hibernate.search.starships.gamma7.directory_provider ?undefined? ]
Currently we'd pick FSDirectory but maybe this existed:
hibernate.search.starships.directory_provider = infinispan
so we would be running on the wrong index, which is not acceptable.
Seems like we would need to drop support for such a sharding option,
or give a very specific meaning to the "." dot character for
sub-shards identification so that the we could infer the need to look
for a "starships" index configuration.
Maybe you have situations where directly mapping an entity's index to a
specific shard of another entity's index makes sense but it seems to me
that we should not support / allow such situation.
Let's not even try to tackle the complexities arisen by shard names such as
@Indexed(index="home.worker")
as worker is a prefix for one of the follow-up options.
So considering these tricky aspects I agree we need a better way to
group all configuration properties for a specific index, but also you
ultimately need to be able to identify an index using the simple name.
It seems that dot as a separator was a poor choice and that we should
depart from it. Something like
hibernate.search.starship[gamma].directory_provider ram
Seems clearer and provide less ambiguity. Of course that does not
prevent someone from having used such approach but at least we can throw
an exception from now one if we find [] in an @Index.name.
If we move to the [] approach, we need to provide a migration path to
users though.
For HSEARCH-1295 I would expect a register of properties using the
index name as key, something very similar to the IndexManagerHolder ?
Not sure I follow you. My pull-request code already does that no?
When we created the IndexManager the intention really was to group
the
related services for a specific index; today sharding works in front
of the IndexManager but we could consider sharding also to be a tree
of IndexManager(s),
where the root instance holds on common strategies and options among
the shard, especially to be able to boot new ones as needed.
We can but what's the benefit? At the end of the day the ability to
select a specific IM shard and associated reader sub selection is what's
really important.
If we could aggressively scan all property keys and from there
pre-construct the configuration metadata for each index, this would
pave the road for validation of the configuration;
We would need to recognize shard-index names and pre build a tree for
the shards, then it would be easy for the DirectoryProvider to figure
out it's parent options and lookup an associated sharding strategy.
I am not sure I entirely follow but I imagine what you are saying is
that you want to break all the initialize() methods and pass a specific
configuration object instead of the Properties object.
That would provide some more info indeed and it would be nice to have
such object to add type-safe information later without breaking the API
but you still need this configuration object to pass the Properties as
you never know what the DirectoryProvider implementation needs for its
configuration.
How does this looks like? :
hibernate.search.users.shards_pathmaker = from_template # we give
some out of the box ways to handle FS path name generations
What's the alternative to from_template?
hibernate.search.users.shards_pathmaker.template =
/var/data/index/{shard}/User # which might have some options
hibernate.search.users.sharding_strategy = my.custom.dynamic.strategy
# Activate sharding
hibernate.search.users.FR.locking_strategy = native # Some override as usual
The tricky aspect is to recognize the last line as an element of the
users index, and as such that we need to apply the template for the
FSDirectory path.
I think the [FR] syntax will help.