On 9 Jul 2009, at 14:57, Emmanuel Bernard wrote:
Here is the concall notes on how to cluster and copy Hibernate
indexes using non file system approaches.
Forget JBoss Cache, forget plain JGroups and focus on Infinispan
Start with Infinispan in replication mode (the most stable code) and
then try distribution. It should be interesting to test the dist
algo and see how well L1 cache behaves in a search environment.
For the architecture, we will try the following approach in
decreasing interest )If the first one works like a charm we stick
with it):
1. share the same grid cache between the master and the slaves
2. have a local cache on the master where indexing is done and
manually copy over the chuncks of changed data to the grid
This requires to store some metadata (namely the list of chunks for
a given index and the lastupdate for each chunk) to implement the
same algorithm as the one implemented in FSMaster/
SlaveDirectoryProvider (incremental copy).
3. have a local cache on the master where indexing is done and
manually copy over the chuncks of changed data to the grid. Each
slave copy from the grid to a local version of the index and use the
local version for search.
When writing the InfinispanDirectory (inspired by the RAMDirectory
and the JBossCacheDirectory), one need to consider than Infinispan
has a flat structure. The key has to contain:
- the index name
- the chunk name
Both with essentially be the unique identifier.
Each chunk should have its size limited (Lucene does that already
AFAIK)
Question on the metadata. one need ot keep the last update and the
list of chuncks. Because Infinispan is not queryable, we need to
store that as metadata:
- should it be on each chunk (ie last time on each chunk, the size
of a chunk)
- on a dedicated metadata chunk ie one metadata chunk per chunk + a
chink containing the list
- on a single metadata chunk (I fear conflicts and inconsistencies)
On changes or read explore the use of Infinispan transaction to
ensure RR semantic. Is it necessary? A file system does not
guarantee that anyway.
In the case of replication, make sure a FD back end can be activated
in case the grid goes to the unreachable clouds of total inactivity.
FD backend? I presume you mean a cache store. Have a look at the
different cache stores we ship with, I reckon a FileCacheStore would
do the trick for you.
http://infinispan.sourceforge.net/4.0/apidocs/org/infinispan/loaders/Cach...
http://infinispan.sourceforge.net/4.0/apidocs/org/infinispan/loaders/file...
Question to Manik: do you have a cluster to play with once we reach
this stage?
The cluster team does have a set of lab servers used to test,
benchmark, etc. You will need to "book" time on this cluster though
since it is shared between JBC/Infinispan, JGroups and JBoss AS
clustering devs.
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org