[hibernate-dev] Catering for a new index + writer/reader

Hardy Ferentschik hardy at hibernate.org
Fri Aug 12 03:54:46 EDT 2011


On Fri, 12 Aug 2011 00:02:09 +0200, Sanne Grinovero <sanne at hibernate.org>  
wrote:

> I just read the document (nice doc! where  did you find it?)

It was attached to the original Lucene issue related to faceting.

> The commit requirements of this taxonomy index look like a mess, and
> it also concerns me that it's totally impossible to remove stuff.

Yeah, there are quite some rules around when to commit in relation to
the main index writer. Good that there is Hibernate Search which can
handle this for the user :-)

Personally I am surprised that they introduced this new taxonomy index.
Funny enough the actual indexed Documents also contain category (faceting)
information. Hence also the need for the DocumentBuilder. I am sure that
there are good reasons to introduce this new index, but I am surprised
nevertheless.

> Yes generally the architecture supports it (as far as how we linked
> all components), but both the backend and the ReaderProvider would
> need a custom implementation; while it looks like the ReaderProvider
> needs an additional API method, I think we can avoid it on the
> backend.

I want to expose as little as possible of the underlying Lucene  
functionality.
For power users we might want to offer some way to access the
TaxonomyIndex/Reader directly. Not sure yet.
We will also need to extend on the annotation side. Our approach allows to
facet on any un-tokenized field. In the Lucene case we need to know for  
which
fields we have to create faceting information. We could do this with an  
additional
optional parameter to @Field or we introduce a new @Faceted (or something  
like this)
annotation. Obviously the Lucene goes a step further with category path  
than our
current faceting approach, but we don't have to extend our faceting DSL  
right
away.


> Also, so you know what kind of data structure expect TaxonomyWriter
> and TaxonomyReader? we'll need clustering for that too, hopefully it's
> similar to a Map, or reuses the Directory API.

For clustering purposes I think we have to look at CategoryPath and how to
serialize it. It should be just a bunch of strings, but I haven't seen the  
code
yet.

It would have been nice to get this stuff into Search 4 as well, but of  
course it
depends on when the next version of Lucene (either 3.4 or 4) would be  
available.

A Hibernate Search 4 bundled w/ Hibernate Core 4 and Lucene 4 would have  
been
cool, but I don't think the timing will work out :-)

--Hardy


> 2011/8/11 Hardy Ferentschik <hibernate at ferentschik.de>:
>> Hi,
>>
>> I was just reading the docs for the new Lucene faceting which makes use  
>> of a new index called taxonomy index. If we are going to use Lucene  
>> capabilities we have to make sure we can plug this into our current  
>> architecture.
>>
>> Reading the docs I can see quite some similarities between our  
>> terminology and theirs. That's good. However, the Lucene approach takes  
>> it much further.
>>
>> We might get a new candidate for serialization as well - CategoryPath.
>>
>> I uploaded the faceting API documentation to our shared dropbox  
>> directory. Have a look in case you are interested.
>>
>> --hardy



More information about the hibernate-dev mailing list