[infinispan-dev] About size()

Dan Berindei dan.berindei at gmail.com
Wed Oct 8 10:57:58 EDT 2014


On Wed, Oct 8, 2014 at 5:42 PM, William Burns <mudokonman at gmail.com> wrote:

> So it seems we would want to change this for 7.0 if possible since it
> would be a bigger change for something like 7.1 and 8.0 would be even
> further out.  I should be able to put this together for CR2.
>

I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend
removing the size() method altogether, and providing some looser
"statistics" instead.


>
> It seems that we want to implement keySet, values and entrySet methods
> using the entry iterator approach.
>
> It is however unclear for the size method if we want to use MR entry
> counting and not worry about the rehash and passivation issues since
> it is just an estimation anyways.  Or if we want to also use the entry
> iterator which should be closer approximation but will require more
> network overhead and memory usage.
>

+1 to use the entry iterator from me, ignoring state transfer we can get
some pretty wild fluctuations in the size of the cache.
We could use a distributed task for Cache.isEmpty() instead of size() == 0,
though.


>
> Also we didn't really talk about the fact that these methods would
> ignore ongoing transactions and if that is a concern or not.
>
>
It might be a concern for the Hibernate 2LC impl, it was their TCK that
prompted the last round of discussions about clear().

We haven't talked about what size(), keySet() and values() should return
for an invalidation cache either... I forget, does the distributed entry
iterator work with invalidation caches?



>  - Will
>
> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <mmarkus at redhat.com> wrote:
> >
> > On Oct 8, 2014, at 15:11, Dan Berindei <dan.berindei at gmail.com> wrote:
> >
> >>
> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >> On Oct 3, 2014, at 9:30, Radim Vansa <rvansa at redhat.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > recently we had a discussion about what size() returns, but I've
> >> > realized there are more things that users would like to know. My
> >> > question is whether you think that they would really appreciate it, or
> >> > whether it's just my QA point of view where I sometimes compute the
> >> > 'checksums' of cache to see if I didn't lost anything.
> >> >
> >> > There are those sizes:
> >> > A) number of owned entries
> >> > B) number of entries stored locally in memory
> >> > C) number of entries stored in each local cache store
> >> > D) number of entries stored in each shared cache store
> >> > E) total number of entries in cache
> >> >
> >> > So far, we can get
> >> > B via withFlags(SKIP_CACHE_LOAD).size()
> >> > (passivation ? B : 0) + firstNonZero(C, D) via size()
> >> > E via distributed iterators / MR
> >> > A via data container iteration + distribution manager query, but only
> >> > without cache store
> >> > C or D through
> >> >
> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
> >> >
> >> > I think that it would go along with users' expectations if size()
> >> > returned E and for the rest we should have special methods on
> >> > AdvancedCache. That would of course change the meaning of size(), but
> >> > I'd say that finally to something that has firm meaning.
> >> >
> >> > WDYT?
> >>
> >> There was a lot of arguments in past whether size() and other methods
> that operate over all the elements (keySet, values) are useful because:
> >> - they are approximate (data changes during iteration)
> >> - they are very resource consuming and might be miss-used (this is the
> reason we chosen to use size() with its current local semantic)
> >>
> >> These methods (size, keys, values) are useful for people and I think we
> were not wise to implement them only on top of the local data: this is like
> preferring efficiency over correctness. This also created a lot of
> confusion with our users, question like size() doesn't return the correct
> value being asked regularly. I totally agree that size() returns E (i.e.
> everything that is stored within the grid, including persistence) and it's
> performance implications to be documented accordingly. For keySet and
> values - we should stop implementing them (throw exception) and point users
> to Will's distributed iterator which is a nicer way to achieve the desired
> behavior.
> >>
> >> We can also implement keySet() and values() on top of the distributed
> entry iterator and document that using the iterator directly is better.
> >
> > Yes, that's what I meant as well.
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org)
> >
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/870e35b8/attachment.html 


More information about the infinispan-dev mailing list