Re: [infinispan-dev] Extend GridFS

Tuesday, 12 July 2011

On 7/12/11 2:58 AM, Yuri de Wit wrote:
...
 Hi Galder,

 Thanks for your reply. Let me continue this discussion here first to
 validate my thinking before I create any issues in JIRA (forgive me
 for the lengthy follow up).

 First of all, thanks for this wonderful project! I started looking
 into Ehcache as the default caching implementation, but found it
 lacking on some key features when using JGroups. My guess is that all
 the development there is going towards the Terracotta distribution
 instead of JGroups. Terracotta does seems like a wonderful product,
 but I was hoping to stick to JGroups based caching impl. So I was
 happy to have found Infinispan. 

Yes, I guess Terracotta (the company) has no interest in using something 
other than Terracotta (the product) in ehcache, let alone supporting a 
competitor...

However, I heard that recently the JGroups plugin for ehcache (which 
used to be terrible) was updated by some outside contributor...

...
 I need to create a distributed cache that loads data from the file
 system. It's a tree of folders/files containing mostly metadata info
 that changes seldom, but changes. Our mid-term goal is to move the
 metadata away from the file system and into a database, but that is
 not feasible now due to a tight deadline and the risks of refactoring
 too much of the code base.

 So I was happy to see the GridFilesystem implementation in Infinispan
 and the fact that clustered caches can be lazily populated (the
 metadata tree in the FS can be large and having all nodes in the
 cluster preloaded with all the data would not work for us). 

Note that I wrote GridFS as a prototype in JGroups, and then Manik 
copied it over to Infinispan. Code quality is beta and not all methods 
have been implemented. So, in short, this is to say that GridFS needs 
some work before it can be used in production !

...
  However,
 it defines it's own persistence scheme with specific file names and
 serialized buckets, which would require us to have a cache-aside
 strategy to read our metadata tree and populate the GridFilesystem
 with it.

 What I am looking for is to be able to plug into the GridFilesystem a
 new FileCacheStore that can load directly from an existing directory
 tree, transparently. This will basically automatically lazy load FS
 content across the cluster without having to pre-populate the
 GridFilesystem programatically. 

Interesting... I guess that loader would have to know the mapping of 
files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a 
get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from 
/home/bela/dump.txt' from the file system and return it, unless it's in 
the local cache.

This requires that your loader knows the chunk size and the 
mapping/naming between files and chunks...

Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the 
chunk number, the suffix should incorporate the index (in bytes), e.g. 
/home/bela/dump.txt.#6000 ?

Also, a put() on the cache loader would have to update the real file, 
and *not* store a chunk named "/home/bela/dump.txt.#3"...

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Extend GridFS