[hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Sanne Grinovero sanne.grinovero at gmail.com
Tue Jun 9 14:54:47 EDT 2009


Hi Chase,
sorry for the late answer; I've just looked at your code, it looks
very good and I'd like to apply this patch if Emmanuel and Hardy
agree?

There are no tests in your patch to verify this is actually useful, do
you have a good example of a ShardingProvider using it?
(tests are not only used to test your code but also serve as examples
and concept demos).
Don't you think the DirectoryProvider names should be exposed, or are
you able to create a nice sharding implementation without needing
that?

If you could add a testcase and documentation updates it would be even
better and speed up the work ;-)

Sanne

2009/6/8 Chase Seibert <chase.seibert at gmail.com>:
> Sanne,
>
> Did you get a change to look at this? Thanks,
>
>    -Chase
>
>
> On Wed, Jun 3, 2009 at 4:08 PM, Chase Seibert <chase.seibert at gmail.com>
> wrote:
>>
>> Sanne,
>>
>> I have implemented your suggestion for IndexShardingStrategy to optionally
>> provide a set of DirectoryProviders BEFORE the search based on one or more
>> FullTextFilters. Using this change, I was able to optimize my specific case
>> to search only hitting the relevant shards.
>>
>> I have not yet implemented your labeled shard idea, nor your shard on enum
>> idea. If we can agree on this change first, I think I can implement those on
>> top of this.
>>
>> Please see attached svn .patch (diff) file. I have tested the patch on
>> 3.1.1 and 3.2.0. Any feedback is welcome.
>>
>>    -Chase
>>
>>
>> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero
>> <sanne.grinovero at gmail.com> wrote:
>>>
>>> I am having a similar need in these days; this should be a very useful
>>> feature, but I'd like more something I could use with the existing API
>>> like
>>>
>>> enableFullTextFilter( "MyShardsSelectionStrategy" ).setParameter( ... )
>>>
>>> a practical example:
>>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>>>
>>> The existing IndexShardingStrategy should be able to be smarter and
>>> have something like
>>>
>>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters && options
>>> )
>>>
>>> So a smart ShardingStrategy could do some selections considering this.
>>>
>>> I'm currently using sharding to shard my index on 25 different
>>> languages (using per-language stemmers), so this would
>>> be useful but I'd especially need to be able to "label" my different
>>> DirectoryProviders using String identifiers,
>>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>>> would use that to store countrycodes and
>>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>>> easily select the right DP when
>>> the LanguageFilter is enabled.
>>>
>>> Another usage would be to shard an entity on an Enumerated property:
>>> in this case an appropriate ShardingStrategy
>>> could be provided by Search and auto-configured by reading the
>>> possible enum values: that would be a very easy way
>>> to enable sharding on an entity.
>>>
>>> Sanne
>>>
>>> 2009/6/3 Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >
>>> >
>>> > Begin forwarded message:
>>> >
>>> > From: chase.seibert+opensubscriber at gmail.com
>>> > Date:  June 3, 2009 09:21:21  PDT
>>> > To: emmanuel at hibernate.org
>>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and avoiding
>>> > query
>>> > on multiple shards
>>> > Reply-To: chase.seibert+opensubscriber at gmail.com
>>> > Emmanuel,
>>> >
>>> > Regarding HSEARCH-251, and
>>> >
>>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9770383.html
>>> >
>>> > Being able to query just a single shard or subset of shards would be
>>> > awesome. I was thinking of a similar API:
>>> >
>>> > IndexShardingStrategy:
>>> > public DirectoryProvider<?>[]
>>> > getDirectoryProviderForShard(int shardNum);
>>> >
>>> > FullTextQuery:
>>> > public void enableShardFilter(int shardNum);
>>> > public void enableShardFilters(int[] shardNums);
>>> >
>>> > FullTextQuery.buildSearcher() would need to be modified to call
>>> > getDirectoryProviderForShard() for each shardNum if shardNums are set,
>>> > otherwise it should continue to use
>>> > getDirectoryProvidersForAllShards();
>>> >
>>> > Calling this API from a consumer's stand-point would look like:
>>> > FullTextQuery fullTextQuery =
>>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>>> > fullTextQuery.enableShardFilter(5);
>>> > fullTextQuery.list();
>>> >
>>> > This could be changed to pass named shards easily. I could prototype
>>> > this
>>> > and submit a .patch if you are interested.
>>> >
>>> >  -Chase
>>> >
>>> > --
>>> > This message was sent on behalf of
>>> > chase.seibert+opensubscriber at gmail.com at
>>> > openSubscriber.com
>>> >
>>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9800518.html
>>> >
>>> >
>>> > _______________________________________________
>>> > hibernate-dev mailing list
>>> > hibernate-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >
>>> >
>>
>
>




More information about the hibernate-dev mailing list