[hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Sanne Grinovero sanne.grinovero at gmail.com
Tue Jun 9 16:40:20 EDT 2009


Hi Chase,
the problem I see is that if you have 3 customers having ids 34,35,202
you'll have to define 202 indexes
but you're right it should be a separate change.

The documentation of Hibernate Search is built from xml files
contained in the sources, you'll find them in
/src/main/docbook/en-US/modules
Using maven for the build you'll get also all documentation built so
you can see how what you write will look like in both PDF and HTML
forms.

Here are the detailed instructions:
https://www.hibernate.org/462.html#A8



2009/6/9 Chase Seibert <chase.seibert at gmail.com>:
> Sanne,
>
> See attached example of a IndexShardingStrategy and unit test. I was able to
> satisfy my shading needs without named shards, though I would be open to
> implementing that separately.
>
> In CustomerShardingStrategy, the index is broken into one shard per
> customer. Each Document contains a customerID field, which is used as the
> index for a local DirectoryProvider array. The number of shards should be at
> least the max customerID. The implementation of
> getDirectoryProvidersForQuery() looks for a Filter called "customer" that
> also contains a customerID parameter, and returns the single
> DirectoryProvider for that customer.
>
> What can I do to aid the documentation effort? Thanks,
>
>    -Chase
>
>
> On Tue, Jun 9, 2009 at 2:54 PM, Sanne Grinovero <sanne.grinovero at gmail.com>
> wrote:
>>
>> Hi Chase,
>> sorry for the late answer; I've just looked at your code, it looks
>> very good and I'd like to apply this patch if Emmanuel and Hardy
>> agree?
>>
>> There are no tests in your patch to verify this is actually useful, do
>> you have a good example of a ShardingProvider using it?
>> (tests are not only used to test your code but also serve as examples
>> and concept demos).
>> Don't you think the DirectoryProvider names should be exposed, or are
>> you able to create a nice sharding implementation without needing
>> that?
>>
>> If you could add a testcase and documentation updates it would be even
>> better and speed up the work ;-)
>>
>> Sanne
>>
>> 2009/6/8 Chase Seibert <chase.seibert at gmail.com>:
>> > Sanne,
>> >
>> > Did you get a change to look at this? Thanks,
>> >
>> >    -Chase
>> >
>> >
>> > On Wed, Jun 3, 2009 at 4:08 PM, Chase Seibert <chase.seibert at gmail.com>
>> > wrote:
>> >>
>> >> Sanne,
>> >>
>> >> I have implemented your suggestion for IndexShardingStrategy to
>> >> optionally
>> >> provide a set of DirectoryProviders BEFORE the search based on one or
>> >> more
>> >> FullTextFilters. Using this change, I was able to optimize my specific
>> >> case
>> >> to search only hitting the relevant shards.
>> >>
>> >> I have not yet implemented your labeled shard idea, nor your shard on
>> >> enum
>> >> idea. If we can agree on this change first, I think I can implement
>> >> those on
>> >> top of this.
>> >>
>> >> Please see attached svn .patch (diff) file. I have tested the patch on
>> >> 3.1.1 and 3.2.0. Any feedback is welcome.
>> >>
>> >>    -Chase
>> >>
>> >>
>> >> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero
>> >> <sanne.grinovero at gmail.com> wrote:
>> >>>
>> >>> I am having a similar need in these days; this should be a very useful
>> >>> feature, but I'd like more something I could use with the existing API
>> >>> like
>> >>>
>> >>> enableFullTextFilter( "MyShardsSelectionStrategy" ).setParameter( ...
>> >>> )
>> >>>
>> >>> a practical example:
>> >>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>> >>>
>> >>> The existing IndexShardingStrategy should be able to be smarter and
>> >>> have something like
>> >>>
>> >>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters &&
>> >>> options
>> >>> )
>> >>>
>> >>> So a smart ShardingStrategy could do some selections considering this.
>> >>>
>> >>> I'm currently using sharding to shard my index on 25 different
>> >>> languages (using per-language stemmers), so this would
>> >>> be useful but I'd especially need to be able to "label" my different
>> >>> DirectoryProviders using String identifiers,
>> >>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>> >>> would use that to store countrycodes and
>> >>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>> >>> easily select the right DP when
>> >>> the LanguageFilter is enabled.
>> >>>
>> >>> Another usage would be to shard an entity on an Enumerated property:
>> >>> in this case an appropriate ShardingStrategy
>> >>> could be provided by Search and auto-configured by reading the
>> >>> possible enum values: that would be a very easy way
>> >>> to enable sharding on an entity.
>> >>>
>> >>> Sanne
>> >>>
>> >>> 2009/6/3 Emmanuel Bernard <emmanuel at hibernate.org>:
>> >>> >
>> >>> >
>> >>> > Begin forwarded message:
>> >>> >
>> >>> > From: chase.seibert+opensubscriber at gmail.com
>> >>> > Date:  June 3, 2009 09:21:21  PDT
>> >>> > To: emmanuel at hibernate.org
>> >>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and
>> >>> > avoiding
>> >>> > query
>> >>> > on multiple shards
>> >>> > Reply-To: chase.seibert+opensubscriber at gmail.com
>> >>> > Emmanuel,
>> >>> >
>> >>> > Regarding HSEARCH-251, and
>> >>> >
>> >>> >
>> >>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9770383.html
>> >>> >
>> >>> > Being able to query just a single shard or subset of shards would be
>> >>> > awesome. I was thinking of a similar API:
>> >>> >
>> >>> > IndexShardingStrategy:
>> >>> > public DirectoryProvider<?>[]
>> >>> > getDirectoryProviderForShard(int shardNum);
>> >>> >
>> >>> > FullTextQuery:
>> >>> > public void enableShardFilter(int shardNum);
>> >>> > public void enableShardFilters(int[] shardNums);
>> >>> >
>> >>> > FullTextQuery.buildSearcher() would need to be modified to call
>> >>> > getDirectoryProviderForShard() for each shardNum if shardNums are
>> >>> > set,
>> >>> > otherwise it should continue to use
>> >>> > getDirectoryProvidersForAllShards();
>> >>> >
>> >>> > Calling this API from a consumer's stand-point would look like:
>> >>> > FullTextQuery fullTextQuery =
>> >>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>> >>> > fullTextQuery.enableShardFilter(5);
>> >>> > fullTextQuery.list();
>> >>> >
>> >>> > This could be changed to pass named shards easily. I could prototype
>> >>> > this
>> >>> > and submit a .patch if you are interested.
>> >>> >
>> >>> >  -Chase
>> >>> >
>> >>> > --
>> >>> > This message was sent on behalf of
>> >>> > chase.seibert+opensubscriber at gmail.com at
>> >>> > openSubscriber.com
>> >>> >
>> >>> >
>> >>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/9800518.html
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > hibernate-dev mailing list
>> >>> > hibernate-dev at lists.jboss.org
>> >>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >>> >
>> >>> >
>> >>
>> >
>> >
>
>




More information about the hibernate-dev mailing list