[infinispan-dev] Extend GridFS

Tue Jul 12 22:24:45 EDT 2011

Ok, I made good progress today and I have close to 100% of my existing
app running on top of GridFS loading chunked data directly from our
existing data files. I have made bug fixes and enhancements to the
org.infinispan.io package and the rest of the implementation went into
the 3 new classes I mentioned before: FileCacheStore,
FileDataCacheStore and FileMetadataCacheStore. I basically got the
functionality right by running single node and now I want to start
testing in a cluster.

One of the todo's is to implement chunked writes and I will see if I
can tackle that later this week. Another todo that seems worth
exploring is GZipping the chunks as soon as it is read from the file
system or only when marshalling and sending it over the wire. Are
there ways to plug that in? I saw something like StreamingMarshaller
that could decouple this from the actual GridFilesystem.

-- yuri

On Tue, Jul 12, 2011 at 10:30 AM, Yuri de Wit <ydewit at gmail.com> wrote:
> Hi Bela,
>
> Thanks for the note re: status. I read about the experimental note in
> the docs and also noted that some of the methods were not implemented
> (e.g. InputStream.skip() and InputStream.available). Refactoring our
> code base is a bigger risk at this point and what I was able to get
> working so far is promising.
>
>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>
> Correct. This is exactly how I implemented the store after looking
> into the existing FileCacheStore. It is basically a base
> FileCacheStore extending LockSupportCacheStore and two subclasses:
> FileMetadataCacheStore and FileDataCacheStore. The first subclass
> returns Metadata entries and the second one returns byte[] chunks.
>
>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>
> Right now I am setting the preferred chunk size in the
> FileDataCacheStore properties in my config file and when I instantiate
> the GridFilesystem (traversing the configs in
> getCacheLoaderManagerConfig) I pass in the same value as the default
> chunk size there.
>
>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>
> Interesting. This could be a bit more reliable, but it wouldnt
> eliminate the need to define the chunk size. In theory, OutputStream
> chunk size could be different than input chunking and the former could
> be client driven and the latter loader driven. However, I am not sure
> the actual benefits and maybe a single chunk size for the cluster
> could be good enough.
>
> Chunking writes is a bit more complex since you don't want to write
> chunk #5 and have the client node stop writing to the OutputStream,
> for instance (or multiple clients writing at the same time). For now I
> have disabled write chunking (worse case scenario the slaves are
> read-only and writes only through master), but I could envision a
> protocol where the chunks are written to a temp file based on a unique
> client stream id and triggered by an OutputStream.close(). A close
> would push a 'closing' chunk with or without actual data that would
> replace the original file on disk. The locking scheme on
> LockSupportCacheStore would make sure there is some FS protection and
> the last client closing the stream would win if multiple clients are
> writing to the same file at once (or maybe an explicit lock using the
> Cache API?).
>
> Another issue I found is with creating directories, but most likely
> with my rewrite. A new GridFile could become a folder or a file so the
> Metadata must have no flags set to (1) mimic the behavior of a real
> File and to (2) make sure the impl can properly implement mkdir(),
> mkdirs() and exists().
>
>>
>> Also, a put() on the cache loader would have to update the real file,
>> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>>
> This is already taken care of by the new implementation i created. The
> only caveat is the chunking on write issues described above.
>
> I have cloned the infinispan project on github and would be happy to
> commit the changes somewhere there so you could take a peak, if
> interested.
>
> One last note is regarding configuration. It seems that the metadata
> cache has to use full replication (or at least it would make the most
> sense) and the data cache has to use distribution mode. It took me a
> few rounds to get it somewhat working (again, I am still learning the
> product and JGroups), but it seems that some of this configuration
> could be hidden from the user. Food for thought and the least of my
> concerns right now.
>
> Let me know what you think.
>
> - yuri
>
>
> On Tue, Jul 12, 2011 at 4:17 AM, Bela Ban <bban at redhat.com> wrote:
>>
>>
>> On 7/12/11 2:58 AM, Yuri de Wit wrote:
>>> Hi Galder,
>>>
>>> Thanks for your reply. Let me continue this discussion here first to
>>> validate my thinking before I create any issues in JIRA (forgive me
>>> for the lengthy follow up).
>>>
>>> First of all, thanks for this wonderful project! I started looking
>>> into Ehcache as the default caching implementation, but found it
>>> lacking on some key features when using JGroups. My guess is that all
>>> the development there is going towards the Terracotta distribution
>>> instead of JGroups. Terracotta does seems like a wonderful product,
>>> but I was hoping to stick to JGroups based caching impl. So I was
>>> happy to have found Infinispan.
>>
>>
>> Yes, I guess Terracotta (the company) has no interest in using something
>> other than Terracotta (the product) in ehcache, let alone supporting a
>> competitor...
>>
>> However, I heard that recently the JGroups plugin for ehcache (which
>> used to be terrible) was updated by some outside contributor...
>>
>>
>>> I need to create a distributed cache that loads data from the file
>>> system. It's a tree of folders/files containing mostly metadata info
>>> that changes seldom, but changes. Our mid-term goal is to move the
>>> metadata away from the file system and into a database, but that is
>>> not feasible now due to a tight deadline and the risks of refactoring
>>> too much of the code base.
>>>
>>> So I was happy to see the GridFilesystem implementation in Infinispan
>>> and the fact that clustered caches can be lazily populated (the
>>> metadata tree in the FS can be large and having all nodes in the
>>> cluster preloaded with all the data would not work for us).
>>
>>
>> Note that I wrote GridFS as a prototype in JGroups, and then Manik
>> copied it over to Infinispan. Code quality is beta and not all methods
>> have been implemented. So, in short, this is to say that GridFS needs
>> some work before it can be used in production !
>>
>>
>>>  However,
>>> it defines it's own persistence scheme with specific file names and
>>> serialized buckets, which would require us to have a cache-aside
>>> strategy to read our metadata tree and populate the GridFilesystem
>>> with it.
>>>
>>> What I am looking for is to be able to plug into the GridFilesystem a
>>> new FileCacheStore that can load directly from an existing directory
>>> tree, transparently. This will basically automatically lazy load FS
>>> content across the cluster without having to pre-populate the
>>> GridFilesystem programatically.
>>
>>
>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>>
>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>>
>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>>
>> Also, a put() on the cache loader would have to update the real file,
>> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>>
>>
>>
>>
>> --
>> Bela Ban
>> Lead JGroups (http://www.jgroups.org)
>> JBoss / Red Hat
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>