[infinispan-dev] About size()

Dan Berindei dan.berindei at gmail.com
Fri Oct 10 10:25:55 EDT 2014


On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <mmarkus at redhat.com> wrote:

>
> On Oct 10, 2014, at 15:18, Radim Vansa <rvansa at redhat.com> wrote:
>
> > That we should expose that as one method, not forcing people to
> > implement the sum() themselves.
>
> Hmm, isn't the method you mention cache.size() ? :-)
>

Nope, because we decided to make cache.size() precise-but-slow :)


>
> >
> > And possibly cachestores, again.
>
> AdvancedCacheLoader has size.
>
> >
> > Radim
> >
> > On 10/10/2014 04:06 PM, Tristan Tarrant wrote:
> >> What's wrong with sum(Datacontainer.size())/numOwners ?
> >>
> >> Tristan
> >>
> >> On 10/10/14 16:03, Radim Vansa wrote:
> >>> On 10/10/2014 02:38 PM, William Burns wrote:
> >>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <rvansa at redhat.com>
> wrote:
> >>>>> Users expect that size() will be constant-time (or linear to cluster
> >>>>> size), and generally fast operation. I'd prefer to keep it that way.
> >>>>> Though, even the MR way (used for HotRod size() now) needs to crawl
> >>>>> through all the entries locally.
> >>>> Many in memory collections require O(n) to do size such as
> >>>> ConcurrentLinkedQueue, so I wouldn't say size should always be
> >>>> expected to be constant time or O(c) where c is # of nodes.  Granted a
> >>>> user can expect anything they want.
> >>> OK, I stand corrected. Moreover, I was generalizing myself to all
> users,
> >>> a common mistake :)
> >>>
> >>> Anyway, monitoring tools love nice charts, and I can imagine monitoring
> >>> software polling every 1 second to update that cool chart with cache
> >>> size. Do we want a fast but imprecise variant of this operation in some
> >>> statistics class?
> >>>
> >>> Radim
> >>>
> >>>>> 'Heretic, not very well though of and changing too many things' idea:
> >>>>> what about having data container segment-aware? Then you'd just bcast
> >>>>> SizeCommand with given topologyId and sum up sizes of primary-owned
> >>>>> segments... It's not a complete solution, but at least that would
> enable
> >>>>> to get the number of locally owned entries quite fast. Though, you
> can't
> >>>>> do that easily with cache stores (without changing SPI).
> >>>>>
> >>>>> Regarding cache stores, IMO we're damned anyway: when calling
> >>>>> cacheStore.size(), it can report more entries as those haven't been
> >>>>> expired yet, it can report less entries as those can be expired due
> to
> >>>>> [1]. Or, we'll enumerate all the entries, and that's going to be slow
> >>>>> (btw., [1] reminded me that we should enumerate both datacontainer
> AND
> >>>>> cachestores even if passivation is not enabled).
> >>>> This is precisely what the distributed iterator does.  And also
> >>>> support for expired entries was recently integrated as I missed that
> >>>> in the original implementation [a]
> >>>>
> >>>> [a] https://issues.jboss.org/browse/ISPN-4643
> >>>>
> >>>>> Radim
> >>>>>
> >>>>> [1] https://issues.jboss.org/browse/ISPN-3202
> >>>>>
> >>>>> On 10/08/2014 04:42 PM, William Burns wrote:
> >>>>>> So it seems we would want to change this for 7.0 if possible since
> it
> >>>>>> would be a bigger change for something like 7.1 and 8.0 would be
> even
> >>>>>> further out.  I should be able to put this together for CR2.
> >>>>>>
> >>>>>> It seems that we want to implement keySet, values and entrySet
> methods
> >>>>>> using the entry iterator approach.
> >>>>>>
> >>>>>> It is however unclear for the size method if we want to use MR entry
> >>>>>> counting and not worry about the rehash and passivation issues since
> >>>>>> it is just an estimation anyways.  Or if we want to also use the
> entry
> >>>>>> iterator which should be closer approximation but will require more
> >>>>>> network overhead and memory usage.
> >>>>>>
> >>>>>> Also we didn't really talk about the fact that these methods would
> >>>>>> ignore ongoing transactions and if that is a concern or not.
> >>>>>>
> >>>>>>     - Will
> >>>>>>
> >>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei <dan.berindei at gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa <rvansa at redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> recently we had a discussion about what size() returns, but I've
> >>>>>>>>> realized there are more things that users would like to know. My
> >>>>>>>>> question is whether you think that they would really appreciate
> it, or
> >>>>>>>>> whether it's just my QA point of view where I sometimes compute
> the
> >>>>>>>>> 'checksums' of cache to see if I didn't lost anything.
> >>>>>>>>>
> >>>>>>>>> There are those sizes:
> >>>>>>>>> A) number of owned entries
> >>>>>>>>> B) number of entries stored locally in memory
> >>>>>>>>> C) number of entries stored in each local cache store
> >>>>>>>>> D) number of entries stored in each shared cache store
> >>>>>>>>> E) total number of entries in cache
> >>>>>>>>>
> >>>>>>>>> So far, we can get
> >>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
> >>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
> >>>>>>>>> E via distributed iterators / MR
> >>>>>>>>> A via data container iteration + distribution manager query, but
> only
> >>>>>>>>> without cache store
> >>>>>>>>> C or D through
> >>>>>>>>>
> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
> >>>>>>>>>
> >>>>>>>>> I think that it would go along with users' expectations if size()
> >>>>>>>>> returned E and for the rest we should have special methods on
> >>>>>>>>> AdvancedCache. That would of course change the meaning of
> size(), but
> >>>>>>>>> I'd say that finally to something that has firm meaning.
> >>>>>>>>>
> >>>>>>>>> WDYT?
> >>>>>>>> There was a lot of arguments in past whether size() and other
> methods that operate over all the elements (keySet, values) are useful
> because:
> >>>>>>>> - they are approximate (data changes during iteration)
> >>>>>>>> - they are very resource consuming and might be miss-used (this
> is the reason we chosen to use size() with its current local semantic)
> >>>>>>>>
> >>>>>>>> These methods (size, keys, values) are useful for people and I
> think we were not wise to implement them only on top of the local data:
> this is like preferring efficiency over correctness. This also created a
> lot of confusion with our users, question like size() doesn't return the
> correct value being asked regularly. I totally agree that size() returns E
> (i.e. everything that is stored within the grid, including persistence) and
> it's performance implications to be documented accordingly. For keySet and
> values - we should stop implementing them (throw exception) and point users
> to Will's distributed iterator which is a nicer way to achieve the desired
> behavior.
> >>>>>>>>
> >>>>>>>> We can also implement keySet() and values() on top of the
> distributed entry iterator and document that using the iterator directly is
> better.
> >>>>>>> Yes, that's what I meant as well.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> --
> >>>>>>> Mircea Markus
> >>>>>>> Infinispan lead (www.infinispan.org)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> infinispan-dev mailing list
> >>>>>>> infinispan-dev at lists.jboss.org
> >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>>>> _______________________________________________
> >>>>>> infinispan-dev mailing list
> >>>>>> infinispan-dev at lists.jboss.org
> >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>>> --
> >>>>> Radim Vansa <rvansa at redhat.com>
> >>>>> JBoss DataGrid QA
> >>>>>
> >>>>> _______________________________________________
> >>>>> infinispan-dev mailing list
> >>>>> infinispan-dev at lists.jboss.org
> >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> infinispan-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Radim Vansa <rvansa at redhat.com>
> > JBoss DataGrid QA
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/cb48fd01/attachment.html 


More information about the infinispan-dev mailing list