[infinispan-dev] About size()

Dan Berindei dan.berindei at gmail.com
Wed Oct 8 12:23:06 EDT 2014


On Wed, Oct 8, 2014 at 6:14 PM, William Burns <mudokonman at gmail.com> wrote:

> On Wed, Oct 8, 2014 at 10:57 AM, Dan Berindei <dan.berindei at gmail.com>
> wrote:
> >
> >
> > On Wed, Oct 8, 2014 at 5:42 PM, William Burns <mudokonman at gmail.com>
> wrote:
> >>
> >> So it seems we would want to change this for 7.0 if possible since it
> >> would be a bigger change for something like 7.1 and 8.0 would be even
> >> further out.  I should be able to put this together for CR2.
> >
> >
> > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend
> > removing the size() method altogether, and providing some looser
> > "statistics" instead.
>
> Yeah I guess I don't know enough about the demand for these methods or
> what people wanted to use them for to know what kind of priority they
> should be given.
>
> It sounds like you are talking about decoupling from the
> Map/ConcurrentMap interface completely then, right?  So we would also
> eliminate the other bulk methods (keySet, values, entrySet)?
>

Yes, I would base the Cache interface on JSR-107's Cache, which doesn't
have size() or the other methods.


>
> >
> >>
> >>
> >> It seems that we want to implement keySet, values and entrySet methods
> >> using the entry iterator approach.
> >>
> >> It is however unclear for the size method if we want to use MR entry
> >> counting and not worry about the rehash and passivation issues since
> >> it is just an estimation anyways.  Or if we want to also use the entry
> >> iterator which should be closer approximation but will require more
> >> network overhead and memory usage.
> >
> >
> > +1 to use the entry iterator from me, ignoring state transfer we can get
> > some pretty wild fluctuations in the size of the cache.
>
> That is personally my feeling as well, but I tend to err more on the
> side of correctness to begin with.
>
> > We could use a distributed task for Cache.isEmpty() instead of size() ==
> 0,
> > though.
>
> Yes that should be a good optimization either way.
>
> >
> >>
> >>
> >> Also we didn't really talk about the fact that these methods would
> >> ignore ongoing transactions and if that is a concern or not.
> >>
> >
> > It might be a concern for the Hibernate 2LC impl, it was their TCK that
> > prompted the last round of discussions about clear().
>
> Although I wonder how much these methods are even used since they only
> work for Local, Replication or Invalidation caches in their current
> state (and didn't even use loaders until 6.0).
>

There is some more information about the test in the mailing list
discussion [1]
There's also a JIRA for clear() [2]

I think 2LC almost never uses distribution, so size() being local-only
didn't matter, but making it non-tx could cause problems - at least for
that particular test.

[1] http://lists.jboss.org/pipermail/infinispan-dev/2013-October/013914.html
[2] https://issues.jboss.org/browse/ISPN-3656


> >
> > We haven't talked about what size(), keySet() and values() should return
> for
> > an invalidation cache either... I forget, does the distributed entry
> > iterator work with invalidation caches?
>
> It works the same as a local cache so only the local node contents are
> returned.  Replicated does the same thing, distributed is the only
> special case.  This was the only thing that made sense to me, but if
> you have any ideas that would  be great to hear for possibly enhancing
> Invalidation iteration.
>

Sounds good to me.

cache.get(k) will search on all the nodes via ClusterLoader, so there is a
certain appeal in making the entry iterator do the same. But invalidation
caches are used with an external (non-CacheLoader) source of data anyway,
so we can never return "all the entries".


> >
> >
> >>
> >>  - Will
> >>
> >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >> >
> >> > On Oct 8, 2014, at 15:11, Dan Berindei <dan.berindei at gmail.com>
> wrote:
> >> >
> >> >>
> >> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmarkus at redhat.com>
> >> >> wrote:
> >> >> On Oct 3, 2014, at 9:30, Radim Vansa <rvansa at redhat.com> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > recently we had a discussion about what size() returns, but I've
> >> >> > realized there are more things that users would like to know. My
> >> >> > question is whether you think that they would really appreciate it,
> >> >> > or
> >> >> > whether it's just my QA point of view where I sometimes compute the
> >> >> > 'checksums' of cache to see if I didn't lost anything.
> >> >> >
> >> >> > There are those sizes:
> >> >> > A) number of owned entries
> >> >> > B) number of entries stored locally in memory
> >> >> > C) number of entries stored in each local cache store
> >> >> > D) number of entries stored in each shared cache store
> >> >> > E) total number of entries in cache
> >> >> >
> >> >> > So far, we can get
> >> >> > B via withFlags(SKIP_CACHE_LOAD).size()
> >> >> > (passivation ? B : 0) + firstNonZero(C, D) via size()
> >> >> > E via distributed iterators / MR
> >> >> > A via data container iteration + distribution manager query, but
> only
> >> >> > without cache store
> >> >> > C or D through
> >> >> >
> >> >> >
> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
> >> >> >
> >> >> > I think that it would go along with users' expectations if size()
> >> >> > returned E and for the rest we should have special methods on
> >> >> > AdvancedCache. That would of course change the meaning of size(),
> but
> >> >> > I'd say that finally to something that has firm meaning.
> >> >> >
> >> >> > WDYT?
> >> >>
> >> >> There was a lot of arguments in past whether size() and other methods
> >> >> that operate over all the elements (keySet, values) are useful
> because:
> >> >> - they are approximate (data changes during iteration)
> >> >> - they are very resource consuming and might be miss-used (this is
> the
> >> >> reason we chosen to use size() with its current local semantic)
> >> >>
> >> >> These methods (size, keys, values) are useful for people and I think
> we
> >> >> were not wise to implement them only on top of the local data: this
> is like
> >> >> preferring efficiency over correctness. This also created a lot of
> confusion
> >> >> with our users, question like size() doesn't return the correct
> value being
> >> >> asked regularly. I totally agree that size() returns E (i.e.
> everything that
> >> >> is stored within the grid, including persistence) and it's
> performance
> >> >> implications to be documented accordingly. For keySet and values -
> we should
> >> >> stop implementing them (throw exception) and point users to Will's
> >> >> distributed iterator which is a nicer way to achieve the desired
> behavior.
> >> >>
> >> >> We can also implement keySet() and values() on top of the distributed
> >> >> entry iterator and document that using the iterator directly is
> better.
> >> >
> >> > Yes, that's what I meant as well.
> >> >
> >> > Cheers,
> >> > --
> >> > Mircea Markus
> >> > Infinispan lead (www.infinispan.org)
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > infinispan-dev mailing list
> >> > infinispan-dev at lists.jboss.org
> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/e3158960/attachment.html 


More information about the infinispan-dev mailing list