On 19 May 2011, at 09:51, Bela Ban wrote:
As someone mentioned, the biggest issue would be to make sure the
data
read from the file system isn't stale. If a node has been down for an
extended period of time, then this process might slow things down.
We'd have to implement some rsync like algorithm, which checks the local
data against the cluster data, and this might be costly if the data set
is small. If it's big, then that cost would be amortized over the
smaller deltas sent over the network to update a local cache.
I don't think this makes sense as (1) data sets in replicated mode are
usually small and (2) Infinispan's focus is on distributed data.
I think in both cases (repl and dist) it still may make sense in some cases. E.g., in
dist, if a node joins, existing owners could, rather than push data to the joiner, just
push a list of {key: version} tuples, which may be significantly smaller than the values.
The joiner can then load stuff from a cache loader based on key/version - we'd need a
new API on the CacheLoader, like load(Set<KeyVersionPair> keys) - this can be
implemented pretty efficiently in many cache stores such as JDBC. The keys that the cache
loader doesn't retrieve would need to be pulled back across the network.
Certainly not high prio, but something to think about for Infinispan.next().
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org