[hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Wed Jun 10 19:12:30 EDT 2009

you're right we need some way to know which filters don't need to be
applied on the "lowlevel" IndexSearcher,
but adding a flag breaks backwards compatibility, and using a type
will get us to use ugly "instanceof"; they're not
bad solutions but what do you think of using a new option of the
@FullTextFilterDef ?
As an annotation it can have a new flag defaulting to the "apply to
index" behaviour; this flag will be read by Search
internals and we can keep track of the options without affecting
public interfaces.
My point is not that writing a "return true" is much work, but that
most filters will not use this feature, as you can define
at most one IndexShardingStrategy per entity.

We could also consider that such a filter implementation is tightly
coupled to the ShardingStrategy itself; it could make
sense to have the filter+shardingStrategy implementations to be merged
in one class, and have a keyword ( "shardsFilter" ?)
to enable THE one sharding-based filter.
This would reduce considerably the amount of code to be written to use
this feature; the filter parameters would be delegated directly to the
shardingStrategy,
which doesn't need to find out which filters it's interested in. There
would be no dump filter implementation at all, it just looks like you
use
a filter from an API point of view but you're actually configuring the
sharding strategy to return what you need.

Sanne

2009/6/10 Emmanuel Bernard <emmanuel at hibernate.org>:
>
> On  Jun 10, 2009, at 15:22, Chase Seibert wrote:
>
>> >That sounds like an elegant approach but we need a way to make it easy to
>> > declare
>> >a filter as dump/shard-sensitive only (ie not force the user to write
>> > some Filter
>> >implementation). With this knowledge, HSearch could ignore the dump
>> > filter for
>> >the actual Lucene filtering operation.
>>
>> I actually think it's rather elegant, because it's backwards compatible.
>> The filter will cause the correct data to return whether it's executed
>> before or after hitting the indexes. Ie, it returns the correct results
>> pre-patch.
>>
>> To avoid the extra over-head, would it be safe to assume that if the
>> IndexShardingStrategy returned a subset (non-zero, non-all) set of
>> DirectoryProviders for getDirectoryProvidersForQuery(FullTextFilter[]
>> filters), that it's safe to ignore that filter post-search? Effectively, we
>> can delete that filter from the array in FullTextQueryImpl.
>
> No, that would not be safe.
> I'd rather get a special marker (flag or class) to specify that a filter
> should be ignored from the list post search.
>