grouping and GridFS - infinispan-dev - Jboss List Archives

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

grouping and GridFS

JBoss Marshalling 1.4.4.Final +...

Since it's almost Friday: the next...

Ales Justin

Wednesday, 5 March 2014 Wed, 5 Mar '14

5:21 a.m.

Just having a discussion with Bela about this. I guess having "grouping" on GridFS' content would make sense. e.g. put all chunks on the same node Is this doable? Afaiu, we would need to have some sort of "similarity" function for content's metadata? -Ales

Reply

Show replies by date

Sanne Grinovero

Wednesday, 5 March Wed, 5 Mar

6:31 a.m.

Why do you chunk at all if you want them stored together? I only use chunking if I can't avoid it, to spread large files. On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote:

Just having a discussion with Bela about this. I guess having "grouping" on GridFS' content would make sense. e.g. put all chunks on the same node Is this doable? Afaiu, we would need to have some sort of "similarity" function for content's metadata? -Ales _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply

Ales Justin

8:54 a.m.

Why do you chunk at all if you want them stored together? I only use chunking if I can't avoid it, to spread large files.

That's what's GridFS all about -- store very large files. Hence chunking. So you're saying we should know the limit of what we can store on 1 node, if bigger, spread, therefore no grouping. -Ales

On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: Just having a discussion with Bela about this. I guess having "grouping" on GridFS' content would make sense. e.g. put all chunks on the same node Is this doable? Afaiu, we would need to have some sort of "similarity" function for content's metadata? -Ales _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply

Sanne Grinovero

9:01 a.m.

On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote:

Why do you chunk at all if you want them stored together? I only use chunking if I can't avoid it, to spread large files. That's what's GridFS all about -- store very large files. Hence chunking. So you're saying we should know the limit of what we can store on 1 node, if bigger, spread, therefore no grouping.

Yes, but a very conservative approximation would be good enough: you don't need hardware specifications to figure out a reasonable threshold. If I had to make up a number out of thin air, I'd pick something around 10MB: any file below that threshold would not use chunking and be nicely stored together to be retrieved efficiently; beyond that start distributing. (this figure could probably use some testing if you're looking into performance) Sanne

-Ales On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: > > Just having a discussion with Bela about this. > > I guess having "grouping" on GridFS' content would make sense. > e.g. put all chunks on the same node > > Is this doable? > Afaiu, we would need to have some sort of "similarity" function for > content's metadata? > > -Ales > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply

Ales Justin

9:04 a.m.

But yeah, the moment I start chunking, I would still like to have the grouped -- same node. Or that doesn't make sense? (hence having this discussion ;-) -Ales On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne(a)infinispan.org> wrote:

On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote: > Why do you chunk at all if you want them stored together? > > I only use chunking if I can't avoid it, to spread large files. > > That's what's GridFS all about -- store very large files. > Hence chunking. > > So you're saying we should know the limit of what we can store on 1 node, > if bigger, spread, therefore no grouping. Yes, but a very conservative approximation would be good enough: you don't need hardware specifications to figure out a reasonable threshold. If I had to make up a number out of thin air, I'd pick something around 10MB: any file below that threshold would not use chunking and be nicely stored together to be retrieved efficiently; beyond that start distributing. (this figure could probably use some testing if you're looking into performance) Sanne > > -Ales > > On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: >> >> Just having a discussion with Bela about this. >> >> I guess having "grouping" on GridFS' content would make sense. >> e.g. put all chunks on the same node >> >> Is this doable? >> Afaiu, we would need to have some sort of "similarity" function for >> content's metadata? >> >> -Ales >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply

Mircea Markus

10:29 a.m.

On Mar 5, 2014, at 3:04 PM, Ales Justin <ales.justin(a)gmail.com> wrote:

But yeah, the moment I start chunking, I would still like to have the grouped -- same node. Or that doesn't make sense? (hence having this discussion ;-) -Ales On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne(a)infinispan.org> wrote: > On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote: >> Why do you chunk at all if you want them stored together? >> >> I only use chunking if I can't avoid it, to spread large files. >> >> That's what's GridFS all about -- store very large files. >> Hence chunking. >> >> So you're saying we should know the limit of what we can store on 1 node, >> if bigger, spread, therefore no grouping. > > Yes, but a very conservative approximation would be good enough: you > don't need hardware specifications to figure out a reasonable > threshold. > If I had to make up a number out of thin air, I'd pick something > around 10MB: any file below that threshold would not use chunking and > be nicely stored together to be retrieved efficiently; beyond that > start distributing.

I don't think that if they are collocated, fetching all the segments to another node brings better performance. Might be quite the opposite actually, as having the segments distributed allows fetching them in parallel.

> (this figure could probably use some testing if you're looking into performance) > > Sanne > >> >> -Ales >> >> On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: >>> >>> Just having a discussion with Bela about this. >>> >>> I guess having "grouping" on GridFS' content would make sense. >>> e.g. put all chunks on the same node >>> >>> Is this doable? >>> Afaiu, we would need to have some sort of "similarity" function for >>> content's metadata? >>> >>> -Ales >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

Reply

Sanne Grinovero

10:50 a.m.

On 5 March 2014 16:29, Mircea Markus <mmarkus(a)redhat.com> wrote:

On Mar 5, 2014, at 3:04 PM, Ales Justin <ales.justin(a)gmail.com> wrote: > But yeah, the moment I start chunking, I would still like to have the grouped -- same node. > Or that doesn't make sense? > (hence having this discussion ;-) > > -Ales > > On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne(a)infinispan.org> wrote: > >> On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote: >>> Why do you chunk at all if you want them stored together? >>> >>> I only use chunking if I can't avoid it, to spread large files. >>> >>> That's what's GridFS all about -- store very large files. >>> Hence chunking. >>> >>> So you're saying we should know the limit of what we can store on 1 node, >>> if bigger, spread, therefore no grouping. >> >> Yes, but a very conservative approximation would be good enough: you >> don't need hardware specifications to figure out a reasonable >> threshold. >> If I had to make up a number out of thin air, I'd pick something >> around 10MB: any file below that threshold would not use chunking and >> be nicely stored together to be retrieved efficiently; beyond that >> start distributing. I don't think that if they are collocated, fetching all the segments to another node brings better performance. Might be quite the opposite actually, as having the segments distributed allows fetching them in parallel.

+1, although we don't do parallel fetching yet. My opinion came from an angle of better spreading the data among the nodes: multiple small segments are better than say 2 files of one terabyte each, which would blow up any single node. But this advice obviously depends on the application. If you know that you will have many files, and you want to use other locality tricks (like run an executor to process all content of a file), then you obviously would have an advantage of keeping them on the same node. In that case though I'd question usage of chunking altogether. Sanne

>> (this figure could probably use some testing if you're looking into performance) >> >> Sanne >> >>> >>> -Ales >>> >>> On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: >>>> >>>> Just having a discussion with Bela about this. >>>> >>>> I guess having "grouping" on GridFS' content would make sense. >>>> e.g. put all chunks on the same node >>>> >>>> Is this doable? >>>> Afaiu, we would need to have some sort of "similarity" function for >>>> content's metadata? >>>> >>>> -Ales >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply

Dennis Reed

10:57 a.m.

It doesn't make sense. :) The reason grid file systems exist is to distribute the file around the cluster. (both for performance so the network interface of a single server isn't a bottleneck, and for disk space so the available space on a single server isn't a bottlenect) If you don't want to distribute the file, a grid filesystem probably isn't the right choice. -Dennis On 03/05/2014 09:04 AM, Ales Justin wrote:

But yeah, the moment I start chunking, I would still like to have the grouped -- same node. Or that doesn't make sense? (hence having this discussion ;-) -Ales On 05 Mar 2014, at 16:01, Sanne Grinovero <sanne(a)infinispan.org> wrote: > On 5 March 2014 14:54, Ales Justin <ales.justin(a)gmail.com> wrote: >> Why do you chunk at all if you want them stored together? >> >> I only use chunking if I can't avoid it, to spread large files. >> >> That's what's GridFS all about -- store very large files. >> Hence chunking. >> >> So you're saying we should know the limit of what we can store on 1 node, >> if bigger, spread, therefore no grouping. > Yes, but a very conservative approximation would be good enough: you > don't need hardware specifications to figure out a reasonable > threshold. > If I had to make up a number out of thin air, I'd pick something > around 10MB: any file below that threshold would not use chunking and > be nicely stored together to be retrieved efficiently; beyond that > start distributing. > (this figure could probably use some testing if you're looking into performance) > > Sanne > >> -Ales >> >> On 5 Mar 2014 11:22, "Ales Justin" <ales.justin(a)gmail.com> wrote: >>> Just having a discussion with Bela about this. >>> >>> I guess having "grouping" on GridFS' content would make sense. >>> e.g. put all chunks on the same node >>> >>> Is this doable? >>> Afaiu, we would need to have some sort of "similarity" function for >>> content's metadata? >>> >>> -Ales >>>

Reply

4514

days inactive

4514

days old

infinispan-dev@lists.jboss.org

Manage subscription

7 comments

4 participants

tags (0)

participants (4)

Ales Justin
Dennis Reed
Mircea Markus
Sanne Grinovero