On 5 March 2014 16:29, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Mar 5, 2014, at 3:04 PM, Ales Justin <ales.justin(a)gmail.com> wrote:
> But yeah, the moment I start chunking, I would still like to have the grouped -- same
node.
> Or that doesn't make sense?
> (hence having this discussion ;-)
>
> -Ales
>
> On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne(a)infinispan.org> wrote:
>
>> On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote:
>>> Why do you chunk at all if you want them stored together?
>>>
>>> I only use chunking if I can't avoid it, to spread large files.
>>>
>>> That's what's GridFS all about -- store very large files.
>>> Hence chunking.
>>>
>>> So you're saying we should know the limit of what we can store on 1
node,
>>> if bigger, spread, therefore no grouping.
>>
>> Yes, but a very conservative approximation would be good enough: you
>> don't need hardware specifications to figure out a reasonable
>> threshold.
>> If I had to make up a number out of thin air, I'd pick something
>> around 10MB: any file below that threshold would not use chunking and
>> be nicely stored together to be retrieved efficiently; beyond that
>> start distributing.
I don't think that if they are collocated, fetching all the segments to another node
brings better performance. Might be quite the opposite actually, as having the segments
distributed allows fetching them in parallel.
+1, although we don't do parallel fetching yet.
My opinion came from an angle of better spreading the data among the
nodes: multiple small segments are better than say 2 files of one
terabyte each, which would blow up any single node.
But this advice obviously depends on the application. If you know that
you will have many files, and you want to use other locality tricks
(like run an executor to process all content of a file), then you
obviously would have an advantage of keeping them on the same node. In
that case though I'd question usage of chunking altogether.
Sanne
>> (this figure could probably use some testing if you're looking into
performance)
>>
>> Sanne
>>
>>>
>>> -Ales
>>>
>>> On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com>
wrote:
>>>>
>>>> Just having a discussion with Bela about this.
>>>>
>>>> I guess having "grouping" on GridFS' content would make
sense.
>>>> e.g. put all chunks on the same node
>>>>
>>>> Is this doable?
>>>> Afaiu, we would need to have some sort of "similarity" function
for
>>>> content's metadata?
>>>>
>>>> -Ales
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev