[hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Emmanuel Bernard emmanuel at hibernate.org
Wed Jun 10 20:44:22 EDT 2009


Here is the patch I've applied. It's mainly the same one but varies in  
subtle ways. I've also expanded the doc. I'll let you check it out.

What I have *not* done is a "unit" test actually indexing data and  
searching the data based on the filter making sure only the targeted  
shard(s) are used (ie user oriented test). Can someone carry on and  
add the test?



On  Jun 10, 2009, at 16:38, Chase Seibert wrote:

> Emmanuel,
>
> Here is the patch with the new changes:
> If getDirectoryProvidersForQuery() returns zero providers, the  
> search will not execute.
> Added methods String getName(), Map<String, Object> getParameters()  
> and isPreSearchShardFilter() to FullTextFilter.
> Not sure what the value of a FullTextFilterImplementor would be, but  
> I can add that if you want.
> Removed casting to FullTextFilterImpl.
> CustomerShardingStrategy returns all shards for  
> getDirectoryProvidersForDeletion().
> In CustomerShardingStrategy.getFilter(), changed filter name  
> comparison to be case sensitive.
> Added a new Filter sub-class, ShardFilterImpl, where  
> isPreSearchShardFilter() == true.
> FullTextQueryImpl.buildFilters() only get isPreSearchShardFilter()  
> == false filters.
> FullTextQueryImpl only passes isPreSearchShardFilter() == true  
> filters to getDirectoryProvidersForQuery().
>
>    -Chase
>
>
> On Wed, Jun 10, 2009 at 3:38 PM, Emmanuel Bernard <emmanuel at hibernate.org 
> > wrote:
> I like the patch, we can apply a slightly modified version of it.
>
> A few comments:
>
> * In private DirectoryProvider[]  
> getDirectoryProviders(DocumentBuilderIndexedEntity builder)
>
> if ( directoryProviders != null && directoryProviders.length > 0 )  
> return directoryProviders;
>
> What's the reasoning for returning all shards if this method returns  
> 0 providers. It seems to me that in this situation you want to avoid  
> executing the query altogether.
>
> * In the testcase, CustomerShardingStrategy does cast FullTextFilter  
> into FullTextFilterImpl
> That sucks.
> We need to introduce a FullTextFilterImplementor interface with  
> getName() ( and maybe Map<String, Object> getParameters() - we can  
> add this one later);
> We would then get
> public DirectoryProvider<?>[]  
> getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
>
> * not sure why the CustomerShardingStrategy does not return all  
> shards for getDirectoryProvidersForDeletion
>
> * in CustomerShardingStrategy.getFilter
> equalsIgnoreCase is wrong. Filter names are case sensitive.
>
> * as I said in my reply to Sanne, I would like to get a way to  
> declare a filter to be shard sensitive only. Either by providing a  
> special class or by providing a flag, not sure.
>
> @FullTextFilterDef(name="customer", impl=ShardSensitiveOnly.class)
>
> Because I think we raise an exception if an unknown filter is  
> requested.
>
> In this case, the ShardSensitiveOnly impl can be ignored by the  
> filter chaining we do otherwise.
>
> Emmanuel
>
> On  Jun 3, 2009, at 16:08, Chase Seibert wrote:
>
>> Sanne,
>>
>> I have implemented your suggestion for IndexShardingStrategy to  
>> optionally provide a set of DirectoryProviders BEFORE the search  
>> based on one or more FullTextFilters. Using this change, I was able  
>> to optimize my specific case to search only hitting the relevant  
>> shards.
>>
>> I have not yet implemented your labeled shard idea, nor your shard  
>> on enum idea. If we can agree on this change first, I think I can  
>> implement those on top of this.
>>
>> Please see attached svn .patch (diff) file. I have tested the patch  
>> on 3.1.1 and 3.2.0. Any feedback is welcome.
>>
>>    -Chase
>>
>>
>> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero <sanne.grinovero at gmail.com 
>> > wrote:
>> I am having a similar need in these days; this should be a very  
>> useful
>> feature, but I'd like more something I could use with the existing  
>> API
>> like
>>
>> enableFullTextFilter 
>> ( "MyShardsSelectionStrategy" ).setParameter( ... )
>>
>> a practical example:
>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>>
>> The existing IndexShardingStrategy should be able to be smarter and
>> have something like
>>
>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters &&  
>> options )
>>
>> So a smart ShardingStrategy could do some selections considering  
>> this.
>>
>> I'm currently using sharding to shard my index on 25 different
>> languages (using per-language stemmers), so this would
>> be useful but I'd especially need to be able to "label" my different
>> DirectoryProviders using String identifiers,
>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>> would use that to store countrycodes and
>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>> easily select the right DP when
>> the LanguageFilter is enabled.
>>
>> Another usage would be to shard an entity on an Enumerated property:
>> in this case an appropriate ShardingStrategy
>> could be provided by Search and auto-configured by reading the
>> possible enum values: that would be a very easy way
>> to enable sharding on an entity.
>>
>> Sanne
>>
>> 2009/6/3 Emmanuel Bernard <emmanuel at hibernate.org>:
>> >
>> >
>> > Begin forwarded message:
>> >
>> > From: chase.seibert+opensubscriber at gmail.com
>> > Date:  June 3, 2009 09:21:21  PDT
>> > To: emmanuel at hibernate.org
>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and  
>> avoiding query
>> > on multiple shards
>> > Reply-To: chase.seibert+opensubscriber at gmail.com
>> > Emmanuel,
>> >
>> > Regarding HSEARCH-251, and
>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9770383.html
>> >
>> > Being able to query just a single shard or subset of shards would  
>> be
>> > awesome. I was thinking of a similar API:
>> >
>> > IndexShardingStrategy:
>> > public DirectoryProvider<?>[]
>> > getDirectoryProviderForShard(int shardNum);
>> >
>> > FullTextQuery:
>> > public void enableShardFilter(int shardNum);
>> > public void enableShardFilters(int[] shardNums);
>> >
>> > FullTextQuery.buildSearcher() would need to be modified to call
>> > getDirectoryProviderForShard() for each shardNum if shardNums are  
>> set,
>> > otherwise it should continue to use  
>> getDirectoryProvidersForAllShards();
>> >
>> > Calling this API from a consumer's stand-point would look like:
>> > FullTextQuery fullTextQuery =
>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>> > fullTextQuery.enableShardFilter(5);
>> > fullTextQuery.list();
>> >
>> > This could be changed to pass named shards easily. I could  
>> prototype this
>> > and submit a .patch if you are interested.
>> >
>> >  -Chase
>> >
>> > --
>> > This message was sent on behalf of chase.seibert+opensubscriber at gmail.com 
>>  at
>> > openSubscriber.com
>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9800518.html
>> >
>> >
>> > _______________________________________________
>> > hibernate-dev mailing list
>> > hibernate-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >
>> >
>>
>> <getDirectoryProvidersForQuery.patch>
>
>
> <all.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090610/d3a4611c/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HSEARCH-251_Query_on_a_shard_subset_based_on_a_filter_activation.patch
Type: application/octet-stream
Size: 34724 bytes
Desc: not available
Url : http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090610/d3a4611c/attachment.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090610/d3a4611c/attachment-0001.html 


More information about the hibernate-dev mailing list