[infinispan-dev] About size()

Mircea Markus mmarkus at redhat.com
Wed Oct 8 10:13:55 EDT 2014


On Oct 8, 2014, at 15:11, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> On Oct 3, 2014, at 9:30, Radim Vansa <rvansa at redhat.com> wrote:
> 
> > Hi,
> >
> > recently we had a discussion about what size() returns, but I've
> > realized there are more things that users would like to know. My
> > question is whether you think that they would really appreciate it, or
> > whether it's just my QA point of view where I sometimes compute the
> > 'checksums' of cache to see if I didn't lost anything.
> >
> > There are those sizes:
> > A) number of owned entries
> > B) number of entries stored locally in memory
> > C) number of entries stored in each local cache store
> > D) number of entries stored in each shared cache store
> > E) total number of entries in cache
> >
> > So far, we can get
> > B via withFlags(SKIP_CACHE_LOAD).size()
> > (passivation ? B : 0) + firstNonZero(C, D) via size()
> > E via distributed iterators / MR
> > A via data container iteration + distribution manager query, but only
> > without cache store
> > C or D through
> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
> >
> > I think that it would go along with users' expectations if size()
> > returned E and for the rest we should have special methods on
> > AdvancedCache. That would of course change the meaning of size(), but
> > I'd say that finally to something that has firm meaning.
> >
> > WDYT?
> 
> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because:
> - they are approximate (data changes during iteration)
> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic)
> 
> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior.
> 
> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better.

Yes, that's what I meant as well.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)







More information about the infinispan-dev mailing list