[infinispan-dev] grouping and GridFS

Wed Mar 5 11:50:48 EST 2014

On 5 March 2014 16:29, Mircea Markus <mmarkus at redhat.com> wrote:
>
> On Mar 5, 2014, at 3:04 PM, Ales Justin <ales.justin at gmail.com> wrote:
>
>> But yeah, the moment I start chunking, I would still like to have the grouped -- same node.
>> Or that doesn't make sense?
>> (hence having this discussion ;-)
>>
>> -Ales
>>
>> On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne at infinispan.org> wrote:
>>
>>> On 5 March 2014 14:54, Ales Justin <ales.justin at gmail.com> wrote:
>>>> Why do you chunk at all if you want them stored together?
>>>>
>>>> I only use chunking if I can't avoid it, to spread large files.
>>>>
>>>> That's what's GridFS all about -- store very large files.
>>>> Hence chunking.
>>>>
>>>> So you're saying we should know the limit of what we can store on 1 node,
>>>> if bigger, spread, therefore no grouping.
>>>
>>> Yes, but a very conservative approximation would be good enough: you
>>> don't need hardware specifications to figure out a reasonable
>>> threshold.
>>> If I had to make up a number out of thin air, I'd pick something
>>> around 10MB: any file below that threshold would not use chunking and
>>> be nicely stored together to be retrieved efficiently; beyond that
>>> start distributing.
>
> I don't think that if they are collocated, fetching all the segments to another node brings better performance. Might be quite the opposite actually, as having the segments distributed allows fetching them in parallel.

+1, although we don't do parallel fetching yet.
My opinion came from an angle of better spreading the data among the
nodes: multiple small segments are better than say 2 files of one
terabyte each, which would blow up any single node.

But this advice obviously depends on the application. If you know that
you will have many files, and you want to use other locality tricks
(like run an executor to process all content of a file), then you
obviously would have an advantage of keeping them on the same node. In
that case though I'd question usage of chunking altogether.

Sanne

>
>>> (this figure could probably use some testing if you're looking into performance)
>>>
>>> Sanne
>>>
>>>>
>>>> -Ales
>>>>
>>>> On 5 Mar 2014 11:22, "Ales Justin" <ales.justin at gmail.com> wrote:
>>>>>
>>>>> Just having a discussion with Bela about this.
>>>>>
>>>>> I guess having "grouping" on GridFS' content would make sense.
>>>>> e.g. put all chunks on the same node
>>>>>
>>>>> Is this doable?
>>>>> Afaiu, we would need to have some sort of "similarity" function for
>>>>> content's metadata?
>>>>>
>>>>> -Ales
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev