Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
by Emmanuel Bernard
Here is the patch I've applied. It's mainly the same one but varies in
subtle ways. I've also expanded the doc. I'll let you check it out.
What I have *not* done is a "unit" test actually indexing data and
searching the data based on the filter making sure only the targeted
shard(s) are used (ie user oriented test). Can someone carry on and
add the test?
On Jun 10, 2009, at 16:38, Chase Seibert wrote:
> Emmanuel,
>
> Here is the patch with the new changes:
> If getDirectoryProvidersForQuery() returns zero providers, the
> search will not execute.
> Added methods String getName(), Map<String, Object> getParameters()
> and isPreSearchShardFilter() to FullTextFilter.
> Not sure what the value of a FullTextFilterImplementor would be, but
> I can add that if you want.
> Removed casting to FullTextFilterImpl.
> CustomerShardingStrategy returns all shards for
> getDirectoryProvidersForDeletion().
> In CustomerShardingStrategy.getFilter(), changed filter name
> comparison to be case sensitive.
> Added a new Filter sub-class, ShardFilterImpl, where
> isPreSearchShardFilter() == true.
> FullTextQueryImpl.buildFilters() only get isPreSearchShardFilter()
> == false filters.
> FullTextQueryImpl only passes isPreSearchShardFilter() == true
> filters to getDirectoryProvidersForQuery().
>
> -Chase
>
>
> On Wed, Jun 10, 2009 at 3:38 PM, Emmanuel Bernard <emmanuel(a)hibernate.org
> > wrote:
> I like the patch, we can apply a slightly modified version of it.
>
> A few comments:
>
> * In private DirectoryProvider[]
> getDirectoryProviders(DocumentBuilderIndexedEntity builder)
>
> if ( directoryProviders != null && directoryProviders.length > 0 )
> return directoryProviders;
>
> What's the reasoning for returning all shards if this method returns
> 0 providers. It seems to me that in this situation you want to avoid
> executing the query altogether.
>
> * In the testcase, CustomerShardingStrategy does cast FullTextFilter
> into FullTextFilterImpl
> That sucks.
> We need to introduce a FullTextFilterImplementor interface with
> getName() ( and maybe Map<String, Object> getParameters() - we can
> add this one later);
> We would then get
> public DirectoryProvider<?>[]
> getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
>
> * not sure why the CustomerShardingStrategy does not return all
> shards for getDirectoryProvidersForDeletion
>
> * in CustomerShardingStrategy.getFilter
> equalsIgnoreCase is wrong. Filter names are case sensitive.
>
> * as I said in my reply to Sanne, I would like to get a way to
> declare a filter to be shard sensitive only. Either by providing a
> special class or by providing a flag, not sure.
>
> @FullTextFilterDef(name="customer", impl=ShardSensitiveOnly.class)
>
> Because I think we raise an exception if an unknown filter is
> requested.
>
> In this case, the ShardSensitiveOnly impl can be ignored by the
> filter chaining we do otherwise.
>
> Emmanuel
>
> On Jun 3, 2009, at 16:08, Chase Seibert wrote:
>
>> Sanne,
>>
>> I have implemented your suggestion for IndexShardingStrategy to
>> optionally provide a set of DirectoryProviders BEFORE the search
>> based on one or more FullTextFilters. Using this change, I was able
>> to optimize my specific case to search only hitting the relevant
>> shards.
>>
>> I have not yet implemented your labeled shard idea, nor your shard
>> on enum idea. If we can agree on this change first, I think I can
>> implement those on top of this.
>>
>> Please see attached svn .patch (diff) file. I have tested the patch
>> on 3.1.1 and 3.2.0. Any feedback is welcome.
>>
>> -Chase
>>
>>
>> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero <sanne.grinovero(a)gmail.com
>> > wrote:
>> I am having a similar need in these days; this should be a very
>> useful
>> feature, but I'd like more something I could use with the existing
>> API
>> like
>>
>> enableFullTextFilter
>> ( "MyShardsSelectionStrategy" ).setParameter( ... )
>>
>> a practical example:
>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>>
>> The existing IndexShardingStrategy should be able to be smarter and
>> have something like
>>
>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters &&
>> options )
>>
>> So a smart ShardingStrategy could do some selections considering
>> this.
>>
>> I'm currently using sharding to shard my index on 25 different
>> languages (using per-language stemmers), so this would
>> be useful but I'd especially need to be able to "label" my different
>> DirectoryProviders using String identifiers,
>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>> would use that to store countrycodes and
>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>> easily select the right DP when
>> the LanguageFilter is enabled.
>>
>> Another usage would be to shard an entity on an Enumerated property:
>> in this case an appropriate ShardingStrategy
>> could be provided by Search and auto-configured by reading the
>> possible enum values: that would be a very easy way
>> to enable sharding on an entity.
>>
>> Sanne
>>
>> 2009/6/3 Emmanuel Bernard <emmanuel(a)hibernate.org>:
>> >
>> >
>> > Begin forwarded message:
>> >
>> > From: chase.seibert+opensubscriber(a)gmail.com
>> > Date: June 3, 2009 09:21:21 PDT
>> > To: emmanuel(a)hibernate.org
>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and
>> avoiding query
>> > on multiple shards
>> > Reply-To: chase.seibert+opensubscriber(a)gmail.com
>> > Emmanuel,
>> >
>> > Regarding HSEARCH-251, and
>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/97703...
>> >
>> > Being able to query just a single shard or subset of shards would
>> be
>> > awesome. I was thinking of a similar API:
>> >
>> > IndexShardingStrategy:
>> > public DirectoryProvider<?>[]
>> > getDirectoryProviderForShard(int shardNum);
>> >
>> > FullTextQuery:
>> > public void enableShardFilter(int shardNum);
>> > public void enableShardFilters(int[] shardNums);
>> >
>> > FullTextQuery.buildSearcher() would need to be modified to call
>> > getDirectoryProviderForShard() for each shardNum if shardNums are
>> set,
>> > otherwise it should continue to use
>> getDirectoryProvidersForAllShards();
>> >
>> > Calling this API from a consumer's stand-point would look like:
>> > FullTextQuery fullTextQuery =
>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>> > fullTextQuery.enableShardFilter(5);
>> > fullTextQuery.list();
>> >
>> > This could be changed to pass named shards easily. I could
>> prototype this
>> > and submit a .patch if you are interested.
>> >
>> > -Chase
>> >
>> > --
>> > This message was sent on behalf of chase.seibert+opensubscriber(a)gmail.com
>> at
>> > openSubscriber.com
>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/98005...
>> >
>> >
>> > _______________________________________________
>> > hibernate-dev mailing list
>> > hibernate-dev(a)lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >
>> >
>>
>> <getDirectoryProvidersForQuery.patch>
>
>
> <all.patch>
15 years, 6 months
Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
by Emmanuel Bernard
I like the patch, we can apply a slightly modified version of it.
A few comments:
* In private DirectoryProvider[]
getDirectoryProviders(DocumentBuilderIndexedEntity builder)
if ( directoryProviders != null && directoryProviders.length > 0 )
return directoryProviders;
What's the reasoning for returning all shards if this method returns 0
providers. It seems to me that in this situation you want to avoid
executing the query altogether.
* In the testcase, CustomerShardingStrategy does cast FullTextFilter
into FullTextFilterImpl
That sucks.
We need to introduce a FullTextFilterImplementor interface with
getName() ( and maybe Map<String, Object> getParameters() - we can add
this one later);
We would then get
public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
* not sure why the CustomerShardingStrategy does not return all shards
for getDirectoryProvidersForDeletion
* in CustomerShardingStrategy.getFilter
equalsIgnoreCase is wrong. Filter names are case sensitive.
* as I said in my reply to Sanne, I would like to get a way to declare
a filter to be shard sensitive only. Either by providing a special
class or by providing a flag, not sure.
@FullTextFilterDef(name="customer", impl=ShardSensitiveOnly.class)
Because I think we raise an exception if an unknown filter is requested.
In this case, the ShardSensitiveOnly impl can be ignored by the filter
chaining we do otherwise.
Emmanuel
On Jun 3, 2009, at 16:08, Chase Seibert wrote:
> Sanne,
>
> I have implemented your suggestion for IndexShardingStrategy to
> optionally provide a set of DirectoryProviders BEFORE the search
> based on one or more FullTextFilters. Using this change, I was able
> to optimize my specific case to search only hitting the relevant
> shards.
>
> I have not yet implemented your labeled shard idea, nor your shard
> on enum idea. If we can agree on this change first, I think I can
> implement those on top of this.
>
> Please see attached svn .patch (diff) file. I have tested the patch
> on 3.1.1 and 3.2.0. Any feedback is welcome.
>
> -Chase
>
>
> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero <sanne.grinovero(a)gmail.com
> > wrote:
> I am having a similar need in these days; this should be a very useful
> feature, but I'd like more something I could use with the existing API
> like
>
> enableFullTextFilter
> ( "MyShardsSelectionStrategy" ).setParameter( ... )
>
> a practical example:
> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>
> The existing IndexShardingStrategy should be able to be smarter and
> have something like
>
> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters &&
> options )
>
> So a smart ShardingStrategy could do some selections considering this.
>
> I'm currently using sharding to shard my index on 25 different
> languages (using per-language stemmers), so this would
> be useful but I'd especially need to be able to "label" my different
> DirectoryProviders using String identifiers,
> I'd suggest to add a getName() to the DirectoryProvider interface: I
> would use that to store countrycodes and
> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
> easily select the right DP when
> the LanguageFilter is enabled.
>
> Another usage would be to shard an entity on an Enumerated property:
> in this case an appropriate ShardingStrategy
> could be provided by Search and auto-configured by reading the
> possible enum values: that would be a very easy way
> to enable sharding on an entity.
>
> Sanne
>
> 2009/6/3 Emmanuel Bernard <emmanuel(a)hibernate.org>:
> >
> >
> > Begin forwarded message:
> >
> > From: chase.seibert+opensubscriber(a)gmail.com
> > Date: June 3, 2009 09:21:21 PDT
> > To: emmanuel(a)hibernate.org
> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and
> avoiding query
> > on multiple shards
> > Reply-To: chase.seibert+opensubscriber(a)gmail.com
> > Emmanuel,
> >
> > Regarding HSEARCH-251, and
> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/97703...
> >
> > Being able to query just a single shard or subset of shards would be
> > awesome. I was thinking of a similar API:
> >
> > IndexShardingStrategy:
> > public DirectoryProvider<?>[]
> > getDirectoryProviderForShard(int shardNum);
> >
> > FullTextQuery:
> > public void enableShardFilter(int shardNum);
> > public void enableShardFilters(int[] shardNums);
> >
> > FullTextQuery.buildSearcher() would need to be modified to call
> > getDirectoryProviderForShard() for each shardNum if shardNums are
> set,
> > otherwise it should continue to use
> getDirectoryProvidersForAllShards();
> >
> > Calling this API from a consumer's stand-point would look like:
> > FullTextQuery fullTextQuery =
> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
> > fullTextQuery.enableShardFilter(5);
> > fullTextQuery.list();
> >
> > This could be changed to pass named shards easily. I could
> prototype this
> > and submit a .patch if you are interested.
> >
> > -Chase
> >
> > --
> > This message was sent on behalf of chase.seibert+opensubscriber(a)gmail.com
> at
> > openSubscriber.com
> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/98005...
> >
> >
> > _______________________________________________
> > hibernate-dev mailing list
> > hibernate-dev(a)lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
> >
> >
>
> <getDirectoryProvidersForQuery.patch>
15 years, 6 months
Fwd: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
by Emmanuel Bernard
Begin forwarded message:
> From: chase.seibert+opensubscriber(a)gmail.com
> Date: June 3, 2009 09:21:21 PDT
> To: emmanuel(a)hibernate.org
> Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and
> avoiding query on multiple shards
> Reply-To: chase.seibert+opensubscriber(a)gmail.com
>
> Emmanuel,
>
> Regarding HSEARCH-251, and http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/97703...
>
> Being able to query just a single shard or subset of shards would be
> awesome. I was thinking of a similar API:
>
> IndexShardingStrategy:
> public DirectoryProvider<?>[]
> getDirectoryProviderForShard(int shardNum);
>
> FullTextQuery:
> public void enableShardFilter(int shardNum);
> public void enableShardFilters(int[] shardNums);
>
> FullTextQuery.buildSearcher() would need to be modified to call
> getDirectoryProviderForShard() for each shardNum if shardNums are
> set, otherwise it should continue to use
> getDirectoryProvidersForAllShards();
>
> Calling this API from a consumer's stand-point would look like:
> FullTextQuery fullTextQuery =
> fullTextSession.createFullTextQuery(luceneQuery, entityClass);
> fullTextQuery.enableShardFilter(5);
> fullTextQuery.list();
>
> This could be changed to pass named shards easily. I could prototype
> this and submit a .patch if you are interested.
>
> -Chase
>
> --
> This message was sent on behalf of chase.seibert+opensubscriber(a)gmail.com
> at openSubscriber.com
> http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/98005...
15 years, 6 months
Wishing to work on Hibernate's DDL capabilities
by Francis Galiegue
Hello everyone,
During the last two weeks, I've been developing a small framework
allowing to create mappings on the fly, all the while allowing
existing sessions to "fail gracefully" (ie, if their session fails to
find a previously mapped column or table, they try with a newer
session). I don't use POJOs, so this makes it relatively easy.
This means I have to do DDL on the fly. Steven Ebersole has pointed me
to the Schema* classes in org.hibernate.hbm2ddl. The current abilities
include adding a table, adding columns to a table and removing a table
(a Configuration object with only the validated mapping of the table
to drop will do the trick nicely). But the DDL capabilities are
lacking:
* Hibernate won't drop a column, by choice, and it currently doesn't
give the user a choice because it cannot do so internally AFAICS;
* it won't add indices when requested, even with hbm2ddl=update;
And others.
Steven has pointed to a Jira task talking about an overhaul of the
Dialect abstract class and all its derivatives, because for one, the
Dialect doesn't provide "purpose oriented" capabilities, just one big
lump of methods. After looking at the code (3.3.1), I can see that
this is the case: for instance, there's no separation between DML and
DDL.
I wish to help. But I'd like to know what you, seasoned Hibernate
developers, would like to see. From the on, I'll put together
proposals and, if one is accepted, I'll start working on it. I don't
have one ready right now, the main reason for it being that I don't
fully grasp the internal Hibernate architecture yet. Is there a class
diagram available somewhere?
Thanks,
--
Francis Galiegue, fgaliegue(a)gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
15 years, 6 months
JPA 2
by Adam Oren
I work for a firm that's *very* interested in the enhancements in identity support, caching and JPQL enhancements. We really hope the spec closes soon. In what version are you currently planning to have JPA 2.0 support GA, and any idea on when that will be?
Thanks!!
Adam
_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tuto...
15 years, 6 months
Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
by Sanne Grinovero
Hi Chase,
the problem I see is that if you have 3 customers having ids 34,35,202
you'll have to define 202 indexes
but you're right it should be a separate change.
The documentation of Hibernate Search is built from xml files
contained in the sources, you'll find them in
/src/main/docbook/en-US/modules
Using maven for the build you'll get also all documentation built so
you can see how what you write will look like in both PDF and HTML
forms.
Here are the detailed instructions:
https://www.hibernate.org/462.html#A8
2009/6/9 Chase Seibert <chase.seibert(a)gmail.com>:
> Sanne,
>
> See attached example of a IndexShardingStrategy and unit test. I was able to
> satisfy my shading needs without named shards, though I would be open to
> implementing that separately.
>
> In CustomerShardingStrategy, the index is broken into one shard per
> customer. Each Document contains a customerID field, which is used as the
> index for a local DirectoryProvider array. The number of shards should be at
> least the max customerID. The implementation of
> getDirectoryProvidersForQuery() looks for a Filter called "customer" that
> also contains a customerID parameter, and returns the single
> DirectoryProvider for that customer.
>
> What can I do to aid the documentation effort? Thanks,
>
> -Chase
>
>
> On Tue, Jun 9, 2009 at 2:54 PM, Sanne Grinovero <sanne.grinovero(a)gmail.com>
> wrote:
>>
>> Hi Chase,
>> sorry for the late answer; I've just looked at your code, it looks
>> very good and I'd like to apply this patch if Emmanuel and Hardy
>> agree?
>>
>> There are no tests in your patch to verify this is actually useful, do
>> you have a good example of a ShardingProvider using it?
>> (tests are not only used to test your code but also serve as examples
>> and concept demos).
>> Don't you think the DirectoryProvider names should be exposed, or are
>> you able to create a nice sharding implementation without needing
>> that?
>>
>> If you could add a testcase and documentation updates it would be even
>> better and speed up the work ;-)
>>
>> Sanne
>>
>> 2009/6/8 Chase Seibert <chase.seibert(a)gmail.com>:
>> > Sanne,
>> >
>> > Did you get a change to look at this? Thanks,
>> >
>> > -Chase
>> >
>> >
>> > On Wed, Jun 3, 2009 at 4:08 PM, Chase Seibert <chase.seibert(a)gmail.com>
>> > wrote:
>> >>
>> >> Sanne,
>> >>
>> >> I have implemented your suggestion for IndexShardingStrategy to
>> >> optionally
>> >> provide a set of DirectoryProviders BEFORE the search based on one or
>> >> more
>> >> FullTextFilters. Using this change, I was able to optimize my specific
>> >> case
>> >> to search only hitting the relevant shards.
>> >>
>> >> I have not yet implemented your labeled shard idea, nor your shard on
>> >> enum
>> >> idea. If we can agree on this change first, I think I can implement
>> >> those on
>> >> top of this.
>> >>
>> >> Please see attached svn .patch (diff) file. I have tested the patch on
>> >> 3.1.1 and 3.2.0. Any feedback is welcome.
>> >>
>> >> -Chase
>> >>
>> >>
>> >> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero
>> >> <sanne.grinovero(a)gmail.com> wrote:
>> >>>
>> >>> I am having a similar need in these days; this should be a very useful
>> >>> feature, but I'd like more something I could use with the existing API
>> >>> like
>> >>>
>> >>> enableFullTextFilter( "MyShardsSelectionStrategy" ).setParameter( ...
>> >>> )
>> >>>
>> >>> a practical example:
>> >>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>> >>>
>> >>> The existing IndexShardingStrategy should be able to be smarter and
>> >>> have something like
>> >>>
>> >>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters &&
>> >>> options
>> >>> )
>> >>>
>> >>> So a smart ShardingStrategy could do some selections considering this.
>> >>>
>> >>> I'm currently using sharding to shard my index on 25 different
>> >>> languages (using per-language stemmers), so this would
>> >>> be useful but I'd especially need to be able to "label" my different
>> >>> DirectoryProviders using String identifiers,
>> >>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>> >>> would use that to store countrycodes and
>> >>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>> >>> easily select the right DP when
>> >>> the LanguageFilter is enabled.
>> >>>
>> >>> Another usage would be to shard an entity on an Enumerated property:
>> >>> in this case an appropriate ShardingStrategy
>> >>> could be provided by Search and auto-configured by reading the
>> >>> possible enum values: that would be a very easy way
>> >>> to enable sharding on an entity.
>> >>>
>> >>> Sanne
>> >>>
>> >>> 2009/6/3 Emmanuel Bernard <emmanuel(a)hibernate.org>:
>> >>> >
>> >>> >
>> >>> > Begin forwarded message:
>> >>> >
>> >>> > From: chase.seibert+opensubscriber(a)gmail.com
>> >>> > Date: June 3, 2009 09:21:21 PDT
>> >>> > To: emmanuel(a)hibernate.org
>> >>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and
>> >>> > avoiding
>> >>> > query
>> >>> > on multiple shards
>> >>> > Reply-To: chase.seibert+opensubscriber(a)gmail.com
>> >>> > Emmanuel,
>> >>> >
>> >>> > Regarding HSEARCH-251, and
>> >>> >
>> >>> >
>> >>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/97703...
>> >>> >
>> >>> > Being able to query just a single shard or subset of shards would be
>> >>> > awesome. I was thinking of a similar API:
>> >>> >
>> >>> > IndexShardingStrategy:
>> >>> > public DirectoryProvider<?>[]
>> >>> > getDirectoryProviderForShard(int shardNum);
>> >>> >
>> >>> > FullTextQuery:
>> >>> > public void enableShardFilter(int shardNum);
>> >>> > public void enableShardFilters(int[] shardNums);
>> >>> >
>> >>> > FullTextQuery.buildSearcher() would need to be modified to call
>> >>> > getDirectoryProviderForShard() for each shardNum if shardNums are
>> >>> > set,
>> >>> > otherwise it should continue to use
>> >>> > getDirectoryProvidersForAllShards();
>> >>> >
>> >>> > Calling this API from a consumer's stand-point would look like:
>> >>> > FullTextQuery fullTextQuery =
>> >>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>> >>> > fullTextQuery.enableShardFilter(5);
>> >>> > fullTextQuery.list();
>> >>> >
>> >>> > This could be changed to pass named shards easily. I could prototype
>> >>> > this
>> >>> > and submit a .patch if you are interested.
>> >>> >
>> >>> > -Chase
>> >>> >
>> >>> > --
>> >>> > This message was sent on behalf of
>> >>> > chase.seibert+opensubscriber(a)gmail.com at
>> >>> > openSubscriber.com
>> >>> >
>> >>> >
>> >>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/98005...
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > hibernate-dev mailing list
>> >>> > hibernate-dev(a)lists.jboss.org
>> >>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >>> >
>> >>> >
>> >>
>> >
>> >
>
>
15 years, 6 months
Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
by Sanne Grinovero
Hi Chase,
sorry for the late answer; I've just looked at your code, it looks
very good and I'd like to apply this patch if Emmanuel and Hardy
agree?
There are no tests in your patch to verify this is actually useful, do
you have a good example of a ShardingProvider using it?
(tests are not only used to test your code but also serve as examples
and concept demos).
Don't you think the DirectoryProvider names should be exposed, or are
you able to create a nice sharding implementation without needing
that?
If you could add a testcase and documentation updates it would be even
better and speed up the work ;-)
Sanne
2009/6/8 Chase Seibert <chase.seibert(a)gmail.com>:
> Sanne,
>
> Did you get a change to look at this? Thanks,
>
> -Chase
>
>
> On Wed, Jun 3, 2009 at 4:08 PM, Chase Seibert <chase.seibert(a)gmail.com>
> wrote:
>>
>> Sanne,
>>
>> I have implemented your suggestion for IndexShardingStrategy to optionally
>> provide a set of DirectoryProviders BEFORE the search based on one or more
>> FullTextFilters. Using this change, I was able to optimize my specific case
>> to search only hitting the relevant shards.
>>
>> I have not yet implemented your labeled shard idea, nor your shard on enum
>> idea. If we can agree on this change first, I think I can implement those on
>> top of this.
>>
>> Please see attached svn .patch (diff) file. I have tested the patch on
>> 3.1.1 and 3.2.0. Any feedback is welcome.
>>
>> -Chase
>>
>>
>> On Wed, Jun 3, 2009 at 1:27 PM, Sanne Grinovero
>> <sanne.grinovero(a)gmail.com> wrote:
>>>
>>> I am having a similar need in these days; this should be a very useful
>>> feature, but I'd like more something I could use with the existing API
>>> like
>>>
>>> enableFullTextFilter( "MyShardsSelectionStrategy" ).setParameter( ... )
>>>
>>> a practical example:
>>> enableFullTextFilter( "LanguageFilter" ).setParameter( "IT-it" )
>>>
>>> The existing IndexShardingStrategy should be able to be smarter and
>>> have something like
>>>
>>> DirectoryProvider<?>[] getDirectoryProvidersForQuery( filters && options
>>> )
>>>
>>> So a smart ShardingStrategy could do some selections considering this.
>>>
>>> I'm currently using sharding to shard my index on 25 different
>>> languages (using per-language stemmers), so this would
>>> be useful but I'd especially need to be able to "label" my different
>>> DirectoryProviders using String identifiers,
>>> I'd suggest to add a getName() to the DirectoryProvider interface: I
>>> would use that to store countrycodes and
>>> keep a map<String,DirectoryProvider> in my ShardingStrategy, so I can
>>> easily select the right DP when
>>> the LanguageFilter is enabled.
>>>
>>> Another usage would be to shard an entity on an Enumerated property:
>>> in this case an appropriate ShardingStrategy
>>> could be provided by Search and auto-configured by reading the
>>> possible enum values: that would be a very easy way
>>> to enable sharding on an entity.
>>>
>>> Sanne
>>>
>>> 2009/6/3 Emmanuel Bernard <emmanuel(a)hibernate.org>:
>>> >
>>> >
>>> > Begin forwarded message:
>>> >
>>> > From: chase.seibert+opensubscriber(a)gmail.com
>>> > Date: June 3, 2009 09:21:21 PDT
>>> > To: emmanuel(a)hibernate.org
>>> > Subject: Re: Re: [hibernate-dev] HSearch: Using sharding and avoiding
>>> > query
>>> > on multiple shards
>>> > Reply-To: chase.seibert+opensubscriber(a)gmail.com
>>> > Emmanuel,
>>> >
>>> > Regarding HSEARCH-251, and
>>> >
>>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/97703...
>>> >
>>> > Being able to query just a single shard or subset of shards would be
>>> > awesome. I was thinking of a similar API:
>>> >
>>> > IndexShardingStrategy:
>>> > public DirectoryProvider<?>[]
>>> > getDirectoryProviderForShard(int shardNum);
>>> >
>>> > FullTextQuery:
>>> > public void enableShardFilter(int shardNum);
>>> > public void enableShardFilters(int[] shardNums);
>>> >
>>> > FullTextQuery.buildSearcher() would need to be modified to call
>>> > getDirectoryProviderForShard() for each shardNum if shardNums are set,
>>> > otherwise it should continue to use
>>> > getDirectoryProvidersForAllShards();
>>> >
>>> > Calling this API from a consumer's stand-point would look like:
>>> > FullTextQuery fullTextQuery =
>>> > fullTextSession.createFullTextQuery(luceneQuery, entityClass);
>>> > fullTextQuery.enableShardFilter(5);
>>> > fullTextQuery.list();
>>> >
>>> > This could be changed to pass named shards easily. I could prototype
>>> > this
>>> > and submit a .patch if you are interested.
>>> >
>>> > -Chase
>>> >
>>> > --
>>> > This message was sent on behalf of
>>> > chase.seibert+opensubscriber(a)gmail.com at
>>> > openSubscriber.com
>>> >
>>> > http://www.opensubscriber.com/message/hibernate-dev@lists.jboss.org/98005...
>>> >
>>> >
>>> > _______________________________________________
>>> > hibernate-dev mailing list
>>> > hibernate-dev(a)lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >
>>> >
>>
>
>
15 years, 6 months
recommend hibernate tools version for bundling with Seam 2.2
by Sanne Grinovero
Hi Max,
I'm goig to update the version of hibernate libs included in the Seam
distribution to line up with JBoss 5.1.0.GA bundled versions;
Which version of Hibernate Tools do you recommend for this?
Looking on www.hibernate.org/255.html latest version appears to be
3.2.4.CR2, (btw homepage mentions only .CR1),
but the compatibility matrix at www.hibernate.org/6.html needs to be
updated as it looks like there are no hibernate tools
compatible with core 3.3.1.GA.
Could you please clarify this?
thanks,
Sanne
15 years, 6 months