[infinispan-dev] Hibernate Search alternative Directory distribution

Thu Jul 9 10:16:15 EDT 2009

On 9 Jul 2009, at 14:57, Emmanuel Bernard wrote:

> Here is the concall notes on how to cluster and copy Hibernate  
> indexes using non file system approaches.
>
> Forget JBoss Cache, forget plain JGroups and focus on Infinispan
> Start with Infinispan in replication mode (the most stable code) and  
> then try distribution. It should be interesting to test the dist  
> algo and see how well L1 cache behaves in a search environment.
> For the architecture, we will try the following approach in  
> decreasing interest )If the first one works like a charm we stick  
> with it):
> 1. share the same grid cache between the master and the slaves
> 2. have a local cache on the master where indexing is done and  
> manually copy over the chuncks of changed data to the grid
> This requires to store some metadata (namely the list of chunks for  
> a given index and the lastupdate for each chunk) to implement the  
> same algorithm as the one implemented in FSMaster/ 
> SlaveDirectoryProvider (incremental copy).
> 3. have a local cache on the master where indexing is done and  
> manually copy over the chuncks of changed data to the grid. Each  
> slave copy from the grid to a local version of the index and use the  
> local version for search.
>
> When writing the InfinispanDirectory (inspired by the RAMDirectory  
> and the JBossCacheDirectory), one need to consider than Infinispan  
> has a flat structure. The key has to contain:
> - the index name
> - the chunk name
> Both with essentially be the unique identifier.
> Each chunk should have its size limited (Lucene does that already  
> AFAIK)
> Question on the metadata. one need ot keep the last update and the  
> list of chuncks. Because Infinispan is not queryable, we need to  
> store that as metadata:
> - should it be on each chunk (ie last time on each chunk, the size  
> of a chunk)
> - on a dedicated metadata chunk ie one metadata chunk per chunk + a  
> chink containing the list
> - on a single metadata chunk (I fear conflicts and inconsistencies)
>
> On changes or read explore the use of Infinispan transaction to  
> ensure RR semantic. Is it necessary? A file system does not  
> guarantee that anyway.
>
> In the case of replication, make sure a FD back end can be activated  
> in case the grid goes to the unreachable clouds of total inactivity.

FD backend?  I presume you mean a cache store.  Have a look at the  
different cache stores we ship with, I reckon a FileCacheStore would  
do the trick for you.

http://infinispan.sourceforge.net/4.0/apidocs/org/infinispan/loaders/CacheStore.html
http://infinispan.sourceforge.net/4.0/apidocs/org/infinispan/loaders/file/FileCacheStore.html

> Question to Manik: do you have a cluster to play with once we reach  
> this stage?

The cluster team does have a set of lab servers used to test,  
benchmark, etc.  You will need to "book" time on this cluster though  
since it is shared between JBC/Infinispan, JGroups and JBoss AS  
clustering devs.

Cheers
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org