Here is the concall notes on how to cluster and copy Hibernate indexes
using non file system approaches.
Forget JBoss Cache, forget plain JGroups and focus on Infinispan
Start with Infinispan in replication mode (the most stable code) and
then try distribution. It should be interesting to test the dist algo
and see how well L1 cache behaves in a search environment.
For the architecture, we will try the following approach in decreasing
interest )If the first one works like a charm we stick with it):
1. share the same grid cache between the master and the slaves
2. have a local cache on the master where indexing is done and
manually copy over the chuncks of changed data to the grid
This requires to store some metadata (namely the list of chunks for a
given index and the lastupdate for each chunk) to implement the same
algorithm as the one implemented in FSMaster/SlaveDirectoryProvider
(incremental copy).
3. have a local cache on the master where indexing is done and
manually copy over the chuncks of changed data to the grid. Each slave
copy from the grid to a local version of the index and use the local
version for search.
When writing the InfinispanDirectory (inspired by the RAMDirectory and
the JBossCacheDirectory), one need to consider than Infinispan has a
flat structure. The key has to contain:
- the index name
- the chunk name
Both with essentially be the unique identifier.
Each chunk should have its size limited (Lucene does that already AFAIK)
Question on the metadata. one need ot keep the last update and the
list of chuncks. Because Infinispan is not queryable, we need to store
that as metadata:
- should it be on each chunk (ie last time on each chunk, the size
of a chunk)
- on a dedicated metadata chunk ie one metadata chunk per chunk + a
chink containing the list
- on a single metadata chunk (I fear conflicts and inconsistencies)
On changes or read explore the use of Infinispan transaction to ensure
RR semantic. Is it necessary? A file system does not guarantee that
anyway.
In the case of replication, make sure a FD back end can be activated
in case the grid goes to the unreachable clouds of total inactivity.
Question to Manik: do you have a cluster to play with once we reach
this stage?
Good luck Lukasz,a nd ask for help if you are stuck or unsure of a
decision :)