[hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards
Emmanuel Bernard
emmanuel at hibernate.org
Sun Aug 3 13:04:51 EDT 2008
--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com
| http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)
On Aug 3, 2008, at 09:15, Sanne Grinovero wrote:
> 2008/8/1 Emmanuel Bernard <emmanuel at hibernate.org>:
>>
>> On Aug 1, 2008, at 13:42, Sanne Grinovero wrote:
>>
>>> Hello Emmanuel,
>>>
>>> 2008/7/31 Emmanuel Bernard <emmanuel at hibernate.org>:
>>>>
>>>> On Jul 31, 2008, at 09:22, Sanne Grinovero wrote:
>>>>>
>>>>> about the API, wouldn't it make more sense to have it look like a
>>>>> filter?
>>>>
>>>> can you give more details?
>>>
>>> I was just thinking about the name
>>> "fullTextQuery.setShardHint("Sony");":
>>> I wouldn't call it a "hint", but a filter as it could affect the
>>> results;
>>> A "hint" sounds like you are trying to improve the performance in
>>> a way that shouldn't change the result, so:
>>>
>>> fullTextQuery.enableFullTextFilter("Sony")
>>>
>>> and it could differ from a normal FullTextFilter only by it's
>>> concrete
>>> implementation.
>>> Just my 2cents, as I think the effect is the same.
>>
>> Interesting concept and much more transparent. Not sure how easy it
>> is to do
>> that though. A typical filter is cached per IndexReader. We cannot
>> do that
>> for the "special" filter as opening the index defeats the purpose.
>> Lucene
>> filters are applied per IndexReader so too late in the game.
>
> You don't need to cache this, as it doesn't really contain the
> filtered data, so we can just
> avoid that. When opening the readers we could look at enabled filters,
> and if there's one
> of this type we just affect the selection of indexes to really open
> (delegate the sharing impl
> to make the right choice); no need to apply a real Lucene filter
> afterwards.
> (it should perform as a cached filter which survives even a index
> reopening, nice!)
> We could look at the filtertypes by name, and put them in separate
> containers at startup to
> avoid the type-checking at runtime.
It's worth trying a prototype. We should open a JIRA issue to capture
that.
>
>
>>
>>>
>>> the feature looks great, but in my case I would need the ability in
>>> the ShardingStrategy to create new
>>> indexes; what do you think about that? I mean the size of the arrays
>>> could need to grow.
>>
>> Yes that's a feature I thought about but it means we will run into
>> a lot of
>> concurrency issues (the HSearch config is all done at init time
>> today). If
>> we do that this needs to be well thought and I am not sure how
>> feasible it
>> is.
> Yes that's why I think we should move away of identifying the shards
> with a
> index number, but give them "names" or some other way to identify
> them.
> Nothing stops your default sharding strategies to expect names as
> "1" and "2",
> but other implementations could prefer a different naming scheme,
> and it could be more readable in the configuration files to select
> different indexing parameters per shard.
I don't see how a different naming scheme helps solving the
concurrency issues.
>
>>
>>>
>>> Basically all my content is "clustered" in some macrocategories, and
>>> usually the search is done after
>>> having selected the category: so it would be perfect to have
>>> actually
>>> different indexes per cat.,
>>> but eventually someone could need to add a new category, the
>>> shardingStrategy would need to write
>>> a new empty index.
>>> I would like also the possibility to move away from array-indexes to
>>> some other identifier for the shards;
>>
>> I am not sure what you gain from that. In any ways, your
>> ShardingStrategy
>> can do the conversion from your cat name to the shard index.
>>
>>>
>>> in my specific case I would love to use something like the PK of the
>>> category: this could enable
>>> an easy filter selection (category could be the parameter of the
>>> filter) and enable something like
>>> "Cascade delete the index" on category removal.
>>> This could become a special implementation of ShardingStrategy, to
>>> be
>>> mandatory when using this kind of filtering?
>>>
>>> btw, I've committed some more fixes for HSEARCH-241
>>>
>>> kind regards,
>>> Sanne
>>
>>
More information about the hibernate-dev
mailing list