On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Oct 10, 2014, at 15:18, Radim Vansa <rvansa(a)redhat.com> wrote:
> That we should expose that as one method, not forcing people to
> implement the sum() themselves.
Hmm, isn't the method you mention cache.size() ? :-)
>
> And possibly cachestores, again.
AdvancedCacheLoader has size.
>
> Radim
>
> On 10/10/2014 04:06 PM, Tristan Tarrant wrote:
>> What's wrong with sum(Datacontainer.size())/numOwners ?
>>
>> Tristan
>>
>> On 10/10/14 16:03, Radim Vansa wrote:
>>> On 10/10/2014 02:38 PM, William Burns wrote:
>>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <rvansa(a)redhat.com>
wrote:
>>>>> Users expect that size() will be constant-time (or linear to
cluster
>>>>> size), and generally fast operation. I'd prefer to keep it that
way.
>>>>> Though, even the MR way (used for HotRod size() now) needs to crawl
>>>>> through all the entries locally.
>>>> Many in memory collections require O(n) to do size such as
>>>> ConcurrentLinkedQueue, so I wouldn't say size should always be
>>>> expected to be constant time or O(c) where c is # of nodes. Granted a
>>>> user can expect anything they want.
>>> OK, I stand corrected. Moreover, I was generalizing myself to all
users,
>>> a common mistake :)
>>>
>>> Anyway, monitoring tools love nice charts, and I can imagine monitoring
>>> software polling every 1 second to update that cool chart with cache
>>> size. Do we want a fast but imprecise variant of this operation in some
>>> statistics class?
>>>
>>> Radim
>>>
>>>>> 'Heretic, not very well though of and changing too many
things' idea:
>>>>> what about having data container segment-aware? Then you'd just
bcast
>>>>> SizeCommand with given topologyId and sum up sizes of primary-owned
>>>>> segments... It's not a complete solution, but at least that
would
enable
>>>>> to get the number of locally owned entries quite fast. Though, you
can't
>>>>> do that easily with cache stores (without changing SPI).
>>>>>
>>>>> Regarding cache stores, IMO we're damned anyway: when calling
>>>>> cacheStore.size(), it can report more entries as those haven't
been
>>>>> expired yet, it can report less entries as those can be expired due
to
>>>>> [1]. Or, we'll enumerate all the entries, and that's going
to be slow
>>>>> (btw., [1] reminded me that we should enumerate both datacontainer
AND
>>>>> cachestores even if passivation is not enabled).
>>>> This is precisely what the distributed iterator does. And also
>>>> support for expired entries was recently integrated as I missed that
>>>> in the original implementation [a]
>>>>
>>>> [a]
https://issues.jboss.org/browse/ISPN-4643
>>>>
>>>>> Radim
>>>>>
>>>>> [1]
https://issues.jboss.org/browse/ISPN-3202
>>>>>
>>>>> On 10/08/2014 04:42 PM, William Burns wrote:
>>>>>> So it seems we would want to change this for 7.0 if possible
since
it
>>>>>> would be a bigger change for something like 7.1 and 8.0 would
be
even
>>>>>> further out. I should be able to put this together for CR2.
>>>>>>
>>>>>> It seems that we want to implement keySet, values and entrySet
methods
>>>>>> using the entry iterator approach.
>>>>>>
>>>>>> It is however unclear for the size method if we want to use MR
entry
>>>>>> counting and not worry about the rehash and passivation issues
since
>>>>>> it is just an estimation anyways. Or if we want to also use
the
entry
>>>>>> iterator which should be closer approximation but will require
more
>>>>>> network overhead and memory usage.
>>>>>>
>>>>>> Also we didn't really talk about the fact that these methods
would
>>>>>> ignore ongoing transactions and if that is a concern or not.
>>>>>>
>>>>>> - Will
>>>>>>
>>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus
<mmarkus(a)redhat.com>
wrote:
>>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei
<dan.berindei(a)gmail.com>
wrote:
>>>>>>>
>>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus
<mmarkus(a)redhat.com>
wrote:
>>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa
<rvansa(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> recently we had a discussion about what size()
returns, but I've
>>>>>>>>> realized there are more things that users would like
to know. My
>>>>>>>>> question is whether you think that they would really
appreciate
it, or
>>>>>>>>> whether it's just my QA point of view where I
sometimes compute
the
>>>>>>>>> 'checksums' of cache to see if I didn't
lost anything.
>>>>>>>>>
>>>>>>>>> There are those sizes:
>>>>>>>>> A) number of owned entries
>>>>>>>>> B) number of entries stored locally in memory
>>>>>>>>> C) number of entries stored in each local cache
store
>>>>>>>>> D) number of entries stored in each shared cache
store
>>>>>>>>> E) total number of entries in cache
>>>>>>>>>
>>>>>>>>> So far, we can get
>>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via
size()
>>>>>>>>> E via distributed iterators / MR
>>>>>>>>> A via data container iteration + distribution
manager query, but
only
>>>>>>>>> without cache store
>>>>>>>>> C or D through
>>>>>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>>>
>>>>>>>>> I think that it would go along with users'
expectations if size()
>>>>>>>>> returned E and for the rest we should have special
methods on
>>>>>>>>> AdvancedCache. That would of course change the
meaning of
size(), but
>>>>>>>>> I'd say that finally to something that has firm
meaning.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>> There was a lot of arguments in past whether size() and
other
methods that operate over all the elements (keySet, values) are useful
because:
>>>>>>>> - they are approximate (data changes during iteration)
>>>>>>>> - they are very resource consuming and might be
miss-used (this
is the reason we chosen to use size() with its current local semantic)
>>>>>>>>
>>>>>>>> These methods (size, keys, values) are useful for people
and I
think we were not wise to implement them only on top of the local data:
this is like preferring efficiency over correctness. This also created a
lot of confusion with our users, question like size() doesn't return the
correct value being asked regularly. I totally agree that size() returns E
(i.e. everything that is stored within the grid, including persistence) and
it's performance implications to be documented accordingly. For keySet and
values - we should stop implementing them (throw exception) and point users
to Will's distributed iterator which is a nicer way to achieve the desired
behavior.
>>>>>>>>
>>>>>>>> We can also implement keySet() and values() on top of
the
distributed entry iterator and document that using the iterator directly is
better.
>>>>>>> Yes, that's what I meant as well.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> --
>>>>>>> Mircea Markus
>>>>>>> Infinispan lead (
www.infinispan.org)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> --
>>>>> Radim Vansa <rvansa(a)redhat.com>
>>>>> JBoss DataGrid QA
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Radim Vansa <rvansa(a)redhat.com>
> JBoss DataGrid QA
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev