[infinispan-dev] Extend GridFS

Bela Ban bban at redhat.com
Wed Jul 13 06:02:13 EDT 2011



On 7/12/11 4:30 PM, Yuri de Wit wrote:
> Hi Bela,

>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>
> Correct. This is exactly how I implemented the store after looking
> into the existing FileCacheStore. It is basically a base
> FileCacheStore extending LockSupportCacheStore and two subclasses:
> FileMetadataCacheStore and FileDataCacheStore. The first subclass
> returns Metadata entries and the second one returns byte[] chunks.


OK


>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>
> Right now I am setting the preferred chunk size in the
> FileDataCacheStore properties in my config file and when I instantiate
> the GridFilesystem (traversing the configs in
> getCacheLoaderManagerConfig) I pass in the same value as the default
> chunk size there


OK, makes sense.


>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>
> Interesting. This could be a bit more reliable, but it wouldnt
> eliminate the need to define the chunk size. In theory, OutputStream
> chunk size could be different than input chunking and the former could
> be client driven and the latter loader driven. However, I am not sure
> the actual benefits and maybe a single chunk size for the cluster
> could be good enough.


Yes, I agree. If you wanted different chunk sizes for different files, 
you could always store the chunk size in the metedata for a given file 
though.


> Chunking writes is a bit more complex since you don't want to write
> chunk #5 and have the client node stop writing to the OutputStream,
> for instance (or multiple clients writing at the same time). For now I
> have disabled write chunking (worse case scenario the slaves are
> read-only and writes only through master), but I could envision a
> protocol where the chunks are written to a temp file based on a unique
> client stream id and triggered by an OutputStream.close(). A close
> would push a 'closing' chunk with or without actual data that would
> replace the original file on disk. The locking scheme on
> LockSupportCacheStore would make sure there is some FS protection and
> the last client closing the stream would win if multiple clients are
> writing to the same file at once (or maybe an explicit lock using the
> Cache API?).


OK


> Another issue I found is with creating directories, but most likely
> with my rewrite. A new GridFile could become a folder or a file so the
> Metadata must have no flags set to (1) mimic the behavior of a real
> File and to (2) make sure the impl can properly implement mkdir(),
> mkdirs() and exists().


Yep, as I said the impl is incomplete...


> I have cloned the infinispan project on github and would be happy to
> commit the changes somewhere there so you could take a peak, if
> interested.


Interested yes, but I have no time to look at this... :-( I'm busy 
working on JGroups 3.0, which should be beta1 soon...

I hope though that your changes go into the Infinispan Git repo, and 
maybe you should publish an article about this on Infoq... ?

> One last note is regarding configuration. It seems that the metadata
> cache has to use full replication (or at least it would make the most
> sense) and the data cache has to use distribution mode.


Yes, that's the idea. Metadata should be small(ish), so full replication 
is warranted. This of course also depends on what we cram into metadata; 
if it becomes too big, or we have many small files, then it might make 
sense to switch to distribution. Anyway, at the end of the day, this is 
a configuration issue and doesn't require code changes.


-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat


More information about the infinispan-dev mailing list