On 7/12/11 2:58 AM, Yuri de Wit wrote:
Hi Galder,
Thanks for your reply. Let me continue this discussion here first to
validate my thinking before I create any issues in JIRA (forgive me
for the lengthy follow up).
First of all, thanks for this wonderful project! I started looking
into Ehcache as the default caching implementation, but found it
lacking on some key features when using JGroups. My guess is that all
the development there is going towards the Terracotta distribution
instead of JGroups. Terracotta does seems like a wonderful product,
but I was hoping to stick to JGroups based caching impl. So I was
happy to have found Infinispan.
Yes, I guess Terracotta (the company) has no interest in using something
other than Terracotta (the product) in ehcache, let alone supporting a
competitor...
However, I heard that recently the JGroups plugin for ehcache (which
used to be terrible) was updated by some outside contributor...
I need to create a distributed cache that loads data from the file
system. It's a tree of folders/files containing mostly metadata info
that changes seldom, but changes. Our mid-term goal is to move the
metadata away from the file system and into a database, but that is
not feasible now due to a tight deadline and the risks of refactoring
too much of the code base.
So I was happy to see the GridFilesystem implementation in Infinispan
and the fact that clustered caches can be lazily populated (the
metadata tree in the FS can be large and having all nodes in the
cluster preloaded with all the data would not work for us).
Note that I wrote GridFS as a prototype in JGroups, and then Manik
copied it over to Infinispan. Code quality is beta and not all methods
have been implemented. So, in short, this is to say that GridFS needs
some work before it can be used in production !
However,
it defines it's own persistence scheme with specific file names and
serialized buckets, which would require us to have a cache-aside
strategy to read our metadata tree and populate the GridFilesystem
with it.
What I am looking for is to be able to plug into the GridFilesystem a
new FileCacheStore that can load directly from an existing directory
tree, transparently. This will basically automatically lazy load FS
content across the cluster without having to pre-populate the
GridFilesystem programatically.
Interesting... I guess that loader would have to know the mapping of
files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
/home/bela/dump.txt' from the file system and return it, unless it's in
the local cache.
This requires that your loader knows the chunk size and the
mapping/naming between files and chunks...
Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
chunk number, the suffix should incorporate the index (in bytes), e.g.
/home/bela/dump.txt.#6000 ?
Also, a put() on the cache loader would have to update the real file,
and *not* store a chunk named "/home/bela/dump.txt.#3"...
--
Bela Ban
Lead JGroups (
http://www.jgroups.org)
JBoss / Red Hat