[infinispan-dev] Extend GridFS

Tue Jul 12 10:30:42 EDT 2011

Hi Bela,

Thanks for the note re: status. I read about the experimental note in
the docs and also noted that some of the methods were not implemented
(e.g. InputStream.skip() and InputStream.available). Refactoring our
code base is a bigger risk at this point and what I was able to get
working so far is promising.

> Interesting... I guess that loader would have to know the mapping of
> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
> /home/bela/dump.txt' from the file system and return it, unless it's in
> the local cache.

Correct. This is exactly how I implemented the store after looking
into the existing FileCacheStore. It is basically a base
FileCacheStore extending LockSupportCacheStore and two subclasses:
FileMetadataCacheStore and FileDataCacheStore. The first subclass
returns Metadata entries and the second one returns byte[] chunks.

> This requires that your loader knows the chunk size and the
> mapping/naming between files and chunks...

Right now I am setting the preferred chunk size in the
FileDataCacheStore properties in my config file and when I instantiate
the GridFilesystem (traversing the configs in
getCacheLoaderManagerConfig) I pass in the same value as the default
chunk size there.

> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
> chunk number, the suffix should incorporate the index (in bytes), e.g.
> /home/bela/dump.txt.#6000 ?

Interesting. This could be a bit more reliable, but it wouldnt
eliminate the need to define the chunk size. In theory, OutputStream
chunk size could be different than input chunking and the former could
be client driven and the latter loader driven. However, I am not sure
the actual benefits and maybe a single chunk size for the cluster
could be good enough.

Chunking writes is a bit more complex since you don't want to write
chunk #5 and have the client node stop writing to the OutputStream,
for instance (or multiple clients writing at the same time). For now I
have disabled write chunking (worse case scenario the slaves are
read-only and writes only through master), but I could envision a
protocol where the chunks are written to a temp file based on a unique
client stream id and triggered by an OutputStream.close(). A close
would push a 'closing' chunk with or without actual data that would
replace the original file on disk. The locking scheme on
LockSupportCacheStore would make sure there is some FS protection and
the last client closing the stream would win if multiple clients are
writing to the same file at once (or maybe an explicit lock using the
Cache API?).

Another issue I found is with creating directories, but most likely
with my rewrite. A new GridFile could become a folder or a file so the
Metadata must have no flags set to (1) mimic the behavior of a real
File and to (2) make sure the impl can properly implement mkdir(),
mkdirs() and exists().

>
> Also, a put() on the cache loader would have to update the real file,
> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>
This is already taken care of by the new implementation i created. The
only caveat is the chunking on write issues described above.

I have cloned the infinispan project on github and would be happy to
commit the changes somewhere there so you could take a peak, if
interested.

One last note is regarding configuration. It seems that the metadata
cache has to use full replication (or at least it would make the most
sense) and the data cache has to use distribution mode. It took me a
few rounds to get it somewhat working (again, I am still learning the
product and JGroups), but it seems that some of this configuration
could be hidden from the user. Food for thought and the least of my
concerns right now.

Let me know what you think.

- yuri

On Tue, Jul 12, 2011 at 4:17 AM, Bela Ban <bban at redhat.com> wrote:
>
>
> On 7/12/11 2:58 AM, Yuri de Wit wrote:
>> Hi Galder,
>>
>> Thanks for your reply. Let me continue this discussion here first to
>> validate my thinking before I create any issues in JIRA (forgive me
>> for the lengthy follow up).
>>
>> First of all, thanks for this wonderful project! I started looking
>> into Ehcache as the default caching implementation, but found it
>> lacking on some key features when using JGroups. My guess is that all
>> the development there is going towards the Terracotta distribution
>> instead of JGroups. Terracotta does seems like a wonderful product,
>> but I was hoping to stick to JGroups based caching impl. So I was
>> happy to have found Infinispan.
>
>
> Yes, I guess Terracotta (the company) has no interest in using something
> other than Terracotta (the product) in ehcache, let alone supporting a
> competitor...
>
> However, I heard that recently the JGroups plugin for ehcache (which
> used to be terrible) was updated by some outside contributor...
>
>
>> I need to create a distributed cache that loads data from the file
>> system. It's a tree of folders/files containing mostly metadata info
>> that changes seldom, but changes. Our mid-term goal is to move the
>> metadata away from the file system and into a database, but that is
>> not feasible now due to a tight deadline and the risks of refactoring
>> too much of the code base.
>>
>> So I was happy to see the GridFilesystem implementation in Infinispan
>> and the fact that clustered caches can be lazily populated (the
>> metadata tree in the FS can be large and having all nodes in the
>> cluster preloaded with all the data would not work for us).
>
>
> Note that I wrote GridFS as a prototype in JGroups, and then Manik
> copied it over to Infinispan. Code quality is beta and not all methods
> have been implemented. So, in short, this is to say that GridFS needs
> some work before it can be used in production !
>
>
>>  However,
>> it defines it's own persistence scheme with specific file names and
>> serialized buckets, which would require us to have a cache-aside
>> strategy to read our metadata tree and populate the GridFilesystem
>> with it.
>>
>> What I am looking for is to be able to plug into the GridFilesystem a
>> new FileCacheStore that can load directly from an existing directory
>> tree, transparently. This will basically automatically lazy load FS
>> content across the cluster without having to pre-populate the
>> GridFilesystem programatically.
>
>
> Interesting... I guess that loader would have to know the mapping of
> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
> /home/bela/dump.txt' from the file system and return it, unless it's in
> the local cache.
>
> This requires that your loader knows the chunk size and the
> mapping/naming between files and chunks...
>
> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
> chunk number, the suffix should incorporate the index (in bytes), e.g.
> /home/bela/dump.txt.#6000 ?
>
> Also, a put() on the cache loader would have to update the real file,
> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>
>
>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>