Exactly, in a monitoring application you wouldn't need the exact number of
key-value mappings in the cache.
The number of entries in memory and/or on disk should be much more
interesting, and we wouldn't have to worry about duplicated/missing/expired
entries to show that.
On Fri, Oct 10, 2014 at 5:06 PM, Tristan Tarrant <ttarrant(a)redhat.com>
wrote:
What's wrong with sum(Datacontainer.size())/numOwners ?
Tristan
On 10/10/14 16:03, Radim Vansa wrote:
> On 10/10/2014 02:38 PM, William Burns wrote:
>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <rvansa(a)redhat.com> wrote:
>>> Users expect that size() will be constant-time (or linear to cluster
>>> size), and generally fast operation. I'd prefer to keep it that way.
>>> Though, even the MR way (used for HotRod size() now) needs to crawl
>>> through all the entries locally.
>> Many in memory collections require O(n) to do size such as
>> ConcurrentLinkedQueue, so I wouldn't say size should always be
>> expected to be constant time or O(c) where c is # of nodes. Granted a
>> user can expect anything they want.
> OK, I stand corrected. Moreover, I was generalizing myself to all users,
> a common mistake :)
>
> Anyway, monitoring tools love nice charts, and I can imagine monitoring
> software polling every 1 second to update that cool chart with cache
> size. Do we want a fast but imprecise variant of this operation in some
> statistics class?
>
> Radim
>
>>> 'Heretic, not very well though of and changing too many things'
idea:
>>> what about having data container segment-aware? Then you'd just bcast
>>> SizeCommand with given topologyId and sum up sizes of primary-owned
>>> segments... It's not a complete solution, but at least that would
enable
>>> to get the number of locally owned entries quite fast. Though, you
can't
>>> do that easily with cache stores (without changing SPI).
>>>
>>> Regarding cache stores, IMO we're damned anyway: when calling
>>> cacheStore.size(), it can report more entries as those haven't been
>>> expired yet, it can report less entries as those can be expired due to
>>> [1]. Or, we'll enumerate all the entries, and that's going to be
slow
>>> (btw., [1] reminded me that we should enumerate both datacontainer AND
>>> cachestores even if passivation is not enabled).
>> This is precisely what the distributed iterator does. And also
>> support for expired entries was recently integrated as I missed that
>> in the original implementation [a]
>>
>> [a]
https://issues.jboss.org/browse/ISPN-4643
>>
>>> Radim
>>>
>>> [1]
https://issues.jboss.org/browse/ISPN-3202
>>>
>>> On 10/08/2014 04:42 PM, William Burns wrote:
>>>> So it seems we would want to change this for 7.0 if possible since it
>>>> would be a bigger change for something like 7.1 and 8.0 would be even
>>>> further out. I should be able to put this together for CR2.
>>>>
>>>> It seems that we want to implement keySet, values and entrySet methods
>>>> using the entry iterator approach.
>>>>
>>>> It is however unclear for the size method if we want to use MR entry
>>>> counting and not worry about the rehash and passivation issues since
>>>> it is just an estimation anyways. Or if we want to also use the entry
>>>> iterator which should be closer approximation but will require more
>>>> network overhead and memory usage.
>>>>
>>>> Also we didn't really talk about the fact that these methods would
>>>> ignore ongoing transactions and if that is a concern or not.
>>>>
>>>> - Will
>>>>
>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus
<mmarkus(a)redhat.com>
wrote:
>>>>> On Oct 8, 2014, at 15:11, Dan Berindei
<dan.berindei(a)gmail.com>
wrote:
>>>>>
>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus
<mmarkus(a)redhat.com>
wrote:
>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa <rvansa(a)redhat.com>
wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> recently we had a discussion about what size() returns, but
I've
>>>>>>> realized there are more things that users would like to
know. My
>>>>>>> question is whether you think that they would really
appreciate
it, or
>>>>>>> whether it's just my QA point of view where I sometimes
compute the
>>>>>>> 'checksums' of cache to see if I didn't lost
anything.
>>>>>>>
>>>>>>> There are those sizes:
>>>>>>> A) number of owned entries
>>>>>>> B) number of entries stored locally in memory
>>>>>>> C) number of entries stored in each local cache store
>>>>>>> D) number of entries stored in each shared cache store
>>>>>>> E) total number of entries in cache
>>>>>>>
>>>>>>> So far, we can get
>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>>>> E via distributed iterators / MR
>>>>>>> A via data container iteration + distribution manager query,
but
only
>>>>>>> without cache store
>>>>>>> C or D through
>>>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>
>>>>>>> I think that it would go along with users' expectations
if size()
>>>>>>> returned E and for the rest we should have special methods
on
>>>>>>> AdvancedCache. That would of course change the meaning of
size(),
but
>>>>>>> I'd say that finally to something that has firm
meaning.
>>>>>>>
>>>>>>> WDYT?
>>>>>> There was a lot of arguments in past whether size() and other
methods that operate over all the elements (keySet, values) are useful
because:
>>>>>> - they are approximate (data changes during iteration)
>>>>>> - they are very resource consuming and might be miss-used (this
is
the reason we chosen to use size() with its current local semantic)
>>>>>>
>>>>>> These methods (size, keys, values) are useful for people and I
think we were not wise to implement them only on top of the local data:
this is like preferring efficiency over correctness. This also created a
lot of confusion with our users, question like size() doesn't return the
correct value being asked regularly. I totally agree that size() returns E
(i.e. everything that is stored within the grid, including persistence) and
it's performance implications to be documented accordingly. For keySet and
values - we should stop implementing them (throw exception) and point users
to Will's distributed iterator which is a nicer way to achieve the desired
behavior.
>>>>>>
>>>>>> We can also implement keySet() and values() on top of the
distributed entry iterator and document that using the iterator directly is
better.
>>>>> Yes, that's what I meant as well.
>>>>>
>>>>> Cheers,
>>>>> --
>>>>> Mircea Markus
>>>>> Infinispan lead (
www.infinispan.org)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> --
>>> Radim Vansa <rvansa(a)redhat.com>
>>> JBoss DataGrid QA
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev