<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <span dir="ltr"><<a href="mailto:mmarkus@redhat.com" target="_blank">mmarkus@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>
On Oct 10, 2014, at 15:18, Radim Vansa <<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>> wrote:<br>
<br>
> That we should expose that as one method, not forcing people to<br>
> implement the sum() themselves.<br>
<br>
</span>Hmm, isn't the method you mention cache.size() ? :-)<br></blockquote><div><br></div><div>Nope, because we decided to make cache.size() precise-but-slow :)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
><br>
> And possibly cachestores, again.<br>
<br>
AdvancedCacheLoader has size.<br>
<div class=""><div class="h5"><br>
><br>
> Radim<br>
><br>
> On 10/10/2014 04:06 PM, Tristan Tarrant wrote:<br>
>> What's wrong with sum(Datacontainer.size())/numOwners ?<br>
>><br>
>> Tristan<br>
>><br>
>> On 10/10/14 16:03, Radim Vansa wrote:<br>
>>> On 10/10/2014 02:38 PM, William Burns wrote:<br>
>>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>> wrote:<br>
>>>>> Users expect that size() will be constant-time (or linear to cluster<br>
>>>>> size), and generally fast operation. I'd prefer to keep it that way.<br>
>>>>> Though, even the MR way (used for HotRod size() now) needs to crawl<br>
>>>>> through all the entries locally.<br>
>>>> Many in memory collections require O(n) to do size such as<br>
>>>> ConcurrentLinkedQueue, so I wouldn't say size should always be<br>
>>>> expected to be constant time or O(c) where c is # of nodes. Granted a<br>
>>>> user can expect anything they want.<br>
>>> OK, I stand corrected. Moreover, I was generalizing myself to all users,<br>
>>> a common mistake :)<br>
>>><br>
>>> Anyway, monitoring tools love nice charts, and I can imagine monitoring<br>
>>> software polling every 1 second to update that cool chart with cache<br>
>>> size. Do we want a fast but imprecise variant of this operation in some<br>
>>> statistics class?<br>
>>><br>
>>> Radim<br>
>>><br>
>>>>> 'Heretic, not very well though of and changing too many things' idea:<br>
>>>>> what about having data container segment-aware? Then you'd just bcast<br>
>>>>> SizeCommand with given topologyId and sum up sizes of primary-owned<br>
>>>>> segments... It's not a complete solution, but at least that would enable<br>
>>>>> to get the number of locally owned entries quite fast. Though, you can't<br>
>>>>> do that easily with cache stores (without changing SPI).<br>
>>>>><br>
>>>>> Regarding cache stores, IMO we're damned anyway: when calling<br>
>>>>> cacheStore.size(), it can report more entries as those haven't been<br>
>>>>> expired yet, it can report less entries as those can be expired due to<br>
>>>>> [1]. Or, we'll enumerate all the entries, and that's going to be slow<br>
>>>>> (btw., [1] reminded me that we should enumerate both datacontainer AND<br>
>>>>> cachestores even if passivation is not enabled).<br>
>>>> This is precisely what the distributed iterator does. And also<br>
>>>> support for expired entries was recently integrated as I missed that<br>
>>>> in the original implementation [a]<br>
>>>><br>
>>>> [a] <a href="https://issues.jboss.org/browse/ISPN-4643" target="_blank">https://issues.jboss.org/browse/ISPN-4643</a><br>
>>>><br>
>>>>> Radim<br>
>>>>><br>
>>>>> [1] <a href="https://issues.jboss.org/browse/ISPN-3202" target="_blank">https://issues.jboss.org/browse/ISPN-3202</a><br>
>>>>><br>
>>>>> On 10/08/2014 04:42 PM, William Burns wrote:<br>
>>>>>> So it seems we would want to change this for 7.0 if possible since it<br>
>>>>>> would be a bigger change for something like 7.1 and 8.0 would be even<br>
>>>>>> further out. I should be able to put this together for CR2.<br>
>>>>>><br>
>>>>>> It seems that we want to implement keySet, values and entrySet methods<br>
>>>>>> using the entry iterator approach.<br>
>>>>>><br>
>>>>>> It is however unclear for the size method if we want to use MR entry<br>
>>>>>> counting and not worry about the rehash and passivation issues since<br>
>>>>>> it is just an estimation anyways. Or if we want to also use the entry<br>
>>>>>> iterator which should be closer approximation but will require more<br>
>>>>>> network overhead and memory usage.<br>
>>>>>><br>
>>>>>> Also we didn't really talk about the fact that these methods would<br>
>>>>>> ignore ongoing transactions and if that is a concern or not.<br>
>>>>>><br>
>>>>>> - Will<br>
>>>>>><br>
>>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <<a href="mailto:mmarkus@redhat.com">mmarkus@redhat.com</a>> wrote:<br>
>>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei <<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>> wrote:<br>
>>>>>>><br>
>>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <<a href="mailto:mmarkus@redhat.com">mmarkus@redhat.com</a>> wrote:<br>
>>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa <<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>> wrote:<br>
>>>>>>>><br>
>>>>>>>>> Hi,<br>
>>>>>>>>><br>
>>>>>>>>> recently we had a discussion about what size() returns, but I've<br>
>>>>>>>>> realized there are more things that users would like to know. My<br>
>>>>>>>>> question is whether you think that they would really appreciate it, or<br>
>>>>>>>>> whether it's just my QA point of view where I sometimes compute the<br>
>>>>>>>>> 'checksums' of cache to see if I didn't lost anything.<br>
>>>>>>>>><br>
>>>>>>>>> There are those sizes:<br>
>>>>>>>>> A) number of owned entries<br>
>>>>>>>>> B) number of entries stored locally in memory<br>
>>>>>>>>> C) number of entries stored in each local cache store<br>
>>>>>>>>> D) number of entries stored in each shared cache store<br>
>>>>>>>>> E) total number of entries in cache<br>
>>>>>>>>><br>
>>>>>>>>> So far, we can get<br>
>>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()<br>
>>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()<br>
>>>>>>>>> E via distributed iterators / MR<br>
>>>>>>>>> A via data container iteration + distribution manager query, but only<br>
>>>>>>>>> without cache store<br>
>>>>>>>>> C or D through<br>
>>>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()<br>
>>>>>>>>><br>
>>>>>>>>> I think that it would go along with users' expectations if size()<br>
>>>>>>>>> returned E and for the rest we should have special methods on<br>
>>>>>>>>> AdvancedCache. That would of course change the meaning of size(), but<br>
>>>>>>>>> I'd say that finally to something that has firm meaning.<br>
>>>>>>>>><br>
>>>>>>>>> WDYT?<br>
>>>>>>>> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because:<br>
>>>>>>>> - they are approximate (data changes during iteration)<br>
>>>>>>>> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic)<br>
>>>>>>>><br>
>>>>>>>> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior.<br>
>>>>>>>><br>
>>>>>>>> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better.<br>
>>>>>>> Yes, that's what I meant as well.<br>
>>>>>>><br>
>>>>>>> Cheers,<br>
>>>>>>> --<br>
>>>>>>> Mircea Markus<br>
>>>>>>> Infinispan lead (<a href="http://www.infinispan.org" target="_blank">www.infinispan.org</a>)<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> _______________________________________________<br>
>>>>>>> infinispan-dev mailing list<br>
>>>>>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>>>>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>>>>>> _______________________________________________<br>
>>>>>> infinispan-dev mailing list<br>
>>>>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>>>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>>>>> --<br>
>>>>> Radim Vansa <<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>><br>
>>>>> JBoss DataGrid QA<br>
>>>>><br>
>>>>> _______________________________________________<br>
>>>>> infinispan-dev mailing list<br>
>>>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>>>> _______________________________________________<br>
>>>> infinispan-dev mailing list<br>
>>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>> _______________________________________________<br>
>> infinispan-dev mailing list<br>
>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
><br>
><br>
> --<br>
> Radim Vansa <<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>><br>
> JBoss DataGrid QA<br>
><br>
> _______________________________________________<br>
> infinispan-dev mailing list<br>
> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
<br>
Cheers,<br>
--<br>
Mircea Markus<br>
Infinispan lead (<a href="http://www.infinispan.org" target="_blank">www.infinispan.org</a>)<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
</div></div></blockquote></div><br></div></div>