[hibernate-dev] [Search] Sharding configured on types, not on indexes

Sanne Grinovero sanne at hibernate.org
Thu Aug 11 08:27:58 EDT 2011


2011/8/11 Hardy Ferentschik <hardy at hibernate.org>:
> On Thu, 11 Aug 2011 13:51:33 +0200, Sanne Grinovero <sanne at hibernate.org>
> wrote:
>
>> This also means we could change the sharding strategy interface to:
>>  a - deal with Entity instances instead of org.apache.lucene.Document
>> and string-encoded ids
>
> Are there not valid use-cases where I want to make the decision using the
> Document. What if I have a custom class bridge which I apply on all
> entities
> in order to add a field to the document ((partly)independent of the entity
> itself)
> which is then used for sharding. I guess I could always write the same
> logic
> in the ShardingStrategy!? Not sure.

I've thought about that same, and concluded than since the Document is
the output of some stateless transformation from the entity, I'm sure
that all what you could do in a custom bridge & sharding strategy
could be implemented in the sharding strategy alone, provided it gets
access to the entity. By definition the entity has more information,
not less, so it's definitely an added value.
Also if you had to change both code places to achieve your desired
sharding strategy, now you only need to do it once, in something
properly named "ShardingStrategy", instead of hacking around and
possibly instead of inserting extra unneeded tokens in the Document
for consumption of the ShardingStrategy.

>> This would affect configuration: instead of configuring them on the
>> index name, an annotation should be placed on the type (or an optional
>> parameter for existing @Indexed).
>
> Are changing the sharding configuration and the ShardingStrategy really
> coupled? We could change the way shards are configured and still keep
> passing the Document to the sharding strategy, maybe in combination with
> the actual entity. Wouldn't that give most flexibility?

As above, I don't think that will give you more functional options,
but maybe you're right that some operations might be easier: I'm
thinking of those cases in which the decisions is related to the
string-encoded form of some complex type, but also I don't see a
practical use of it, and nothing prevents me to re-encode it as needed
(other than repeating the encoding).
Still, not passing the Document provides you more advantages:
1 - we often felt the need of using a different representation, but
can't experiment with it as this is part of public API
2 - we would not be tight to build the Document before applying the
sharding logic: building the full Document often entails a lot of
work, not least loading extra entities for the sake of relations
indexing, and the strategy could decide to skip it. (I know I'm
trespassing in the area of another popular feature request which I
wouldn't implement this way, but as an example of kind of things we
can do with this added flexibility!)

>> On top of a greater flexibility in sharding, but will also avoid
>> exposing the o.a.l.Document yet in another API.
>
> Is that such a bad thing?
Not extremely evil, but I hope above answers this.

Sanne




More information about the hibernate-dev mailing list