[infinispan-dev] Extend GridFS

Mon Jul 11 20:58:27 EDT 2011

Hi Galder,

Thanks for your reply. Let me continue this discussion here first to
validate my thinking before I create any issues in JIRA (forgive me
for the lengthy follow up).

First of all, thanks for this wonderful project! I started looking
into Ehcache as the default caching implementation, but found it
lacking on some key features when using JGroups. My guess is that all
the development there is going towards the Terracotta distribution
instead of JGroups. Terracotta does seems like a wonderful product,
but I was hoping to stick to JGroups based caching impl. So I was
happy to have found Infinispan.

I need to create a distributed cache that loads data from the file
system. It's a tree of folders/files containing mostly metadata info
that changes seldom, but changes. Our mid-term goal is to move the
metadata away from the file system and into a database, but that is
not feasible now due to a tight deadline and the risks of refactoring
too much of the code base.

So I was happy to see the GridFilesystem implementation in Infinispan
and the fact that clustered caches can be lazily populated (the
metadata tree in the FS can be large and having all nodes in the
cluster preloaded with all the data would not work for us). However,
it defines it's own persistence scheme with specific file names and
serialized buckets, which would require us to have a cache-aside
strategy to read our metadata tree and populate the GridFilesystem
with it.

What I am looking for is to be able to plug into the GridFilesystem a
new FileCacheStore that can load directly from an existing directory
tree, transparently. This will basically automatically lazy load FS
content across the cluster without having to pre-populate the
GridFilesystem programatically.

At first I was hoping to extend the existing FileCacheStore to support
this (hence why I was asking for a GripInputStream.skip()/available()
implementation and make the constructors protected instead package
level access), but I later realized that what I needed was an entire
new implementation since the buckets abstraction there is not really
appropriate.

The good news is that I am close to 75% complete with the impl here.
It is working, with a few caveats, beautifully for a single node, but
I am facing some issues trying to launch a second node in the cluster
(most of it my ignorance, I am sure).

** Do you see any issues with this approach that I am not aware of?

In addition, I am having a couple of issues launching the second node
in the cluster. A couple of NPEs and an exception
"java.net.NoRouteToHostException: No route to host". I will send the
details to these exceptions in a follow up email.

This is where I am stuck at the moment. In my setup I have two
configuration files:
* cache-master.xml
* cache-slave.xml
Both define data and the metadata caches required by GridFilesystem
but -master.xml configures the custom FileCacheStore I implemented and
-slave.xml uses the ClusterCacheLoader.

These are some of the items/todo's for this custom FileCacheStore impl:
** Implement chunked writes with a special chunking protocol to
trigger when the last chunk has been delivered
** custom configuration to simplify it for GridFilesystem.

regards,
-- yuri

 With the exception of supporting a safe chunk write (for now I am
sending the whole file content when writing to the cache, since
chunked write would require additional changes to GridFS such as a
protocol to let the loader know that the current chunk is the last one
and so it can finally update the underlying file as a whole, etc).
 can parsed and translate into file read on the real

Any change to implement the skip() and available() methods in
GridInputStream or make the constructors in the GridFileSystem package
public so I can easily extend them?

I am trying to plug in a custom FileMetadataCacheStore and
FileDataCacheStore implementation under the metadata/data caches used
by the GridFS so that loading from an existing FS it completely
transparent and lazy (I'll be happy to contribute if it makes sense).
The problem is that any BufferedInputStream's wrapped around the
GridInputStream call available/skip but they are not implemented in
gridfs.

Do you also see any issues with the above approach?

regards,

On Mon, Jul 11, 2011 at 12:06 PM, galderz
<reply+m-9778632-9f501737dc4435143baf6908afcd349935f887a8 at reply.github.com>
wrote:
> Hey Yuri,
>
> Why do you need two new file based stores? Can't you plug Infinispan with a file based cache store to give you FS persistence?
>
> Anyway, I'd suggest you discuss it in the Infinispan dev list (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an issue in https://issues.jboss.org/browse/ISPN
>
> Cheers,
> Galder
>
> --
> Reply to this email directly or view it on GitHub:
> http://github.com/inbox/9770320#reply
>