[hibernate-dev] Hibernate Search alternative Directory distribution

Thu Jul 9 09:57:04 EDT 2009

Here is the concall notes on how to cluster and copy Hibernate indexes  
using non file system approaches.

Forget JBoss Cache, forget plain JGroups and focus on Infinispan
Start with Infinispan in replication mode (the most stable code) and  
then try distribution. It should be interesting to test the dist algo  
and see how well L1 cache behaves in a search environment.
For the architecture, we will try the following approach in decreasing  
interest )If the first one works like a charm we stick with it):
  1. share the same grid cache between the master and the slaves
  2. have a local cache on the master where indexing is done and  
manually copy over the chuncks of changed data to the grid
This requires to store some metadata (namely the list of chunks for a  
given index and the lastupdate for each chunk) to implement the same  
algorithm as the one implemented in FSMaster/SlaveDirectoryProvider  
(incremental copy).
  3. have a local cache on the master where indexing is done and  
manually copy over the chuncks of changed data to the grid. Each slave  
copy from the grid to a local version of the index and use the local  
version for search.

When writing the InfinispanDirectory (inspired by the RAMDirectory and  
the JBossCacheDirectory), one need to consider than Infinispan has a  
flat structure. The key has to contain:
  - the index name
  - the chunk name
Both with essentially be the unique identifier.
Each chunk should have its size limited (Lucene does that already AFAIK)
Question on the metadata. one need ot keep the last update and the  
list of chuncks. Because Infinispan is not queryable, we need to store  
that as metadata:
  - should it be on each chunk (ie last time on each chunk, the size  
of a chunk)
  - on a dedicated metadata chunk ie one metadata chunk per chunk + a  
chink containing the list
  - on a single metadata chunk (I fear conflicts and inconsistencies)

On changes or read explore the use of Infinispan transaction to ensure  
RR semantic. Is it necessary? A file system does not guarantee that  
anyway.

In the case of replication, make sure a FD back end can be activated  
in case the grid goes to the unreachable clouds of total inactivity.

Question to Manik: do you have a cluster to play with once we reach  
this stage?

Good luck Lukasz,a nd ask for help if you are stuck or unsure of a  
decision :)