Re: [infinispan-dev] About size()

Friday, 10 October 2014

I meant as a quick implementation of that, not that we want to tell 
users to do it.

Tristan

On 10/10/14 16:18, Radim Vansa wrote:
...
 That we should expose that as one method, not forcing people to
 implement the sum() themselves.

 And possibly cachestores, again.

 Radim

 On 10/10/2014 04:06 PM, Tristan Tarrant wrote:
> What's wrong with sum(Datacontainer.size())/numOwners ?
>
> Tristan
>
> On 10/10/14 16:03, Radim Vansa wrote:
>> On 10/10/2014 02:38 PM, William Burns wrote:
>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <rvansa(a)redhat.com&gt;
wrote:
>>>> Users expect that size() will be constant-time (or linear to cluster
>>>> size), and generally fast operation. I'd prefer to keep it that way.
>>>> Though, even the MR way (used for HotRod size() now) needs to crawl
>>>> through all the entries locally.
>>> Many in memory collections require O(n) to do size such as
>>> ConcurrentLinkedQueue, so I wouldn't say size should always be
>>> expected to be constant time or O(c) where c is # of nodes.  Granted a
>>> user can expect anything they want.
>> OK, I stand corrected. Moreover, I was generalizing myself to all users,
>> a common mistake :)
>>
>> Anyway, monitoring tools love nice charts, and I can imagine monitoring
>> software polling every 1 second to update that cool chart with cache
>> size. Do we want a fast but imprecise variant of this operation in some
>> statistics class?
>>
>> Radim
>>
>>>> 'Heretic, not very well though of and changing too many things'
idea:
>>>> what about having data container segment-aware? Then you'd just
bcast
>>>> SizeCommand with given topologyId and sum up sizes of primary-owned
>>>> segments... It's not a complete solution, but at least that would
enable
>>>> to get the number of locally owned entries quite fast. Though, you
can't
>>>> do that easily with cache stores (without changing SPI).
>>>>
>>>> Regarding cache stores, IMO we're damned anyway: when calling
>>>> cacheStore.size(), it can report more entries as those haven't been
>>>> expired yet, it can report less entries as those can be expired due to
>>>> [1]. Or, we'll enumerate all the entries, and that's going to be
slow
>>>> (btw., [1] reminded me that we should enumerate both datacontainer AND
>>>> cachestores even if passivation is not enabled).
>>> This is precisely what the distributed iterator does.  And also
>>> support for expired entries was recently integrated as I missed that
>>> in the original implementation [a]
>>>
>>> [a] https://issues.jboss.org/browse/ISPN-4643
>>>
>>>> Radim
>>>>
>>>> [1] https://issues.jboss.org/browse/ISPN-3202
>>>>
>>>> On 10/08/2014 04:42 PM, William Burns wrote:
>>>>> So it seems we would want to change this for 7.0 if possible since
it
>>>>> would be a bigger change for something like 7.1 and 8.0 would be
even
>>>>> further out.  I should be able to put this together for CR2.
>>>>>
>>>>> It seems that we want to implement keySet, values and entrySet
methods
>>>>> using the entry iterator approach.
>>>>>
>>>>> It is however unclear for the size method if we want to use MR entry
>>>>> counting and not worry about the rehash and passivation issues since
>>>>> it is just an estimation anyways.  Or if we want to also use the
entry
>>>>> iterator which should be closer approximation but will require more
>>>>> network overhead and memory usage.
>>>>>
>>>>> Also we didn't really talk about the fact that these methods
would
>>>>> ignore ongoing transactions and if that is a concern or not.
>>>>>
>>>>>       - Will
>>>>>
>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus
<mmarkus(a)redhat.com&gt; wrote:
>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei
<dan.berindei(a)gmail.com&gt; wrote:
>>>>>>
>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus
<mmarkus(a)redhat.com&gt; wrote:
>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa
<rvansa(a)redhat.com&gt; wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> recently we had a discussion about what size() returns,
but I've
>>>>>>>> realized there are more things that users would like to
know. My
>>>>>>>> question is whether you think that they would really
appreciate it, or
>>>>>>>> whether it's just my QA point of view where I
sometimes compute the
>>>>>>>> 'checksums' of cache to see if I didn't lost
anything.
>>>>>>>>
>>>>>>>> There are those sizes:
>>>>>>>> A) number of owned entries
>>>>>>>> B) number of entries stored locally in memory
>>>>>>>> C) number of entries stored in each local cache store
>>>>>>>> D) number of entries stored in each shared cache store
>>>>>>>> E) total number of entries in cache
>>>>>>>>
>>>>>>>> So far, we can get
>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>>>>> E via distributed iterators / MR
>>>>>>>> A via data container iteration + distribution manager
query, but only
>>>>>>>> without cache store
>>>>>>>> C or D through
>>>>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>>
>>>>>>>> I think that it would go along with users'
expectations if size()
>>>>>>>> returned E and for the rest we should have special
methods on
>>>>>>>> AdvancedCache. That would of course change the meaning of
size(), but
>>>>>>>> I'd say that finally to something that has firm
meaning.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>> There was a lot of arguments in past whether size() and other
methods that operate over all the elements (keySet, values) are useful because:
>>>>>>> - they are approximate (data changes during iteration)
>>>>>>> - they are very resource consuming and might be miss-used
(this is the reason we chosen to use size() with its current local semantic)
>>>>>>>
>>>>>>> These methods (size, keys, values) are useful for people and
I think we were not wise to implement them only on top of the local data: this is like
preferring efficiency over correctness. This also created a lot of confusion with our
users, question like size() doesn't return the correct value being asked regularly. I
totally agree that size() returns E (i.e. everything that is stored within the grid,
including persistence) and it's performance implications to be documented accordingly.
For keySet and values - we should stop implementing them (throw exception) and point users
to Will's distributed iterator which is a nicer way to achieve the desired behavior.
>>>>>>>
>>>>>>> We can also implement keySet() and values() on top of the
distributed entry iterator and document that using the iterator directly is better.
>>>>>> Yes, that's what I meant as well.
>>>>>>
>>>>>> Cheers,
>>>>>> --
>>>>>> Mircea Markus
>>>>>> Infinispan lead (www.infinispan.org)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> --
>>>> Radim Vansa <rvansa(a)redhat.com&gt;
>>>> JBoss DataGrid QA
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] About size()