<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 8, 2014 at 7:53 PM, William Burns <span dir="ltr">&lt;<a href="mailto:mudokonman@gmail.com" target="_blank">mudokonman@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, Oct 8, 2014 at 12:41 PM, Dan Berindei &lt;<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>&gt; wrote:<br>

&gt;<br>

&gt;<br>

&gt; On Wed, Oct 8, 2014 at 6:19 PM, Radim Vansa &lt;<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt; Users expect that size() will be constant-time (or linear to cluster<br>

&gt;&gt; size), and generally fast operation. I&#39;d prefer to keep it that way.<br>

&gt;&gt; Though, even the MR way (used for HotRod size() now) needs to crawl<br>

&gt;&gt; through all the entries locally.<br>

&gt;<br>

&gt;<br>

&gt; They might expect that, but there is nothing in the Map API suggesting it.<br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; &#39;Heretic, not very well though of and changing too many things&#39; idea:<br>

&gt;&gt; what about having data container segment-aware? Then you&#39;d just bcast<br>

&gt;&gt; SizeCommand with given topologyId and sum up sizes of primary-owned<br>

&gt;&gt; segments... It&#39;s not a complete solution, but at least that would enable<br>

&gt;&gt; to get the number of locally owned entries quite fast. Though, you can&#39;t<br>

&gt;&gt; do that easily with cache stores (without changing SPI).<br>

&gt;<br>

&gt;<br>

&gt; We could create a separate DataContainer for each segment. But would it<br>

&gt; really be worth the trouble? I don&#39;t know of anyone using size() for<br>

&gt; something other than checking that their data was properly loaded into the<br>

&gt; cache, and they don&#39;t need a super-fast size() for that.<br>

<br>

</span>Having a DataContainer per segment would actually reduce required<br>

memory usage for the distributed iterator as well, since we can query<br>

data by segment much more efficiently and close out segments one by<br>

one per node instead of having to keep multiple open at once.  When I<br>

asked about this before it was kind of a we can deal with it later<br>

kind thing.  I would think this would increase ST operation time as<br>

well.<br></blockquote><div><br></div><div>You mean it would improve ST performance, because it wouldn&#39;t have to compute the hash of each key in the data container?</div><div><br></div><div>I don&#39;t think we have ever considered splitting the data container for ST, as it didn&#39;t seem worth the trouble. OTOH we wanted to add a segment-based query to the cache loader SPI every since we started designing NBST :)</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; Regarding cache stores, IMO we&#39;re damned anyway: when calling<br>

&gt;&gt; cacheStore.size(), it can report more entries as those haven&#39;t been<br>

&gt;&gt; expired yet, it can report less entries as those can be expired due to<br>

&gt;&gt; [1]. Or, we&#39;ll enumerate all the entries, and that&#39;s going to be slow<br>

&gt;&gt; (btw., [1] reminded me that we should enumerate both datacontainer AND<br>

&gt;&gt; cachestores even if passivation is not enabled).<br>

&gt;<br>

&gt;<br>

&gt; Exactly, we need to iterate all the entries from the stores if we want<br>

&gt; something remotely accurate (although I had forgotten about expiration also<br>

&gt; being a problem). Otherwise we could just leave size() as it is now, it&#39;s<br>

&gt; pretty fast :)<br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; Radim<br>

&gt;&gt;<br>

&gt;&gt; [1] <a href="https://issues.jboss.org/browse/ISPN-3202" target="_blank">https://issues.jboss.org/browse/ISPN-3202</a><br>

&gt;&gt;<br>

&gt;&gt; On 10/08/2014 04:42 PM, William Burns wrote:<br>

&gt;&gt; &gt; So it seems we would want to change this for 7.0 if possible since it<br>

&gt;&gt; &gt; would be a bigger change for something like 7.1 and 8.0 would be even<br>

&gt;&gt; &gt; further out.  I should be able to put this together for CR2.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; It seems that we want to implement keySet, values and entrySet methods<br>

&gt;&gt; &gt; using the entry iterator approach.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; It is however unclear for the size method if we want to use MR entry<br>

&gt;&gt; &gt; counting and not worry about the rehash and passivation issues since<br>

&gt;&gt; &gt; it is just an estimation anyways.  Or if we want to also use the entry<br>

&gt;&gt; &gt; iterator which should be closer approximation but will require more<br>

&gt;&gt; &gt; network overhead and memory usage.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Also we didn&#39;t really talk about the fact that these methods would<br>

&gt;&gt; &gt; ignore ongoing transactions and if that is a concern or not.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;   - Will<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus &lt;<a href="mailto:mmarkus@redhat.com">mmarkus@redhat.com</a>&gt;<br>

&gt;&gt; &gt; wrote:<br>

&gt;&gt; &gt;&gt; On Oct 8, 2014, at 15:11, Dan Berindei &lt;<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>&gt; wrote:<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt; On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus &lt;<a href="mailto:mmarkus@redhat.com">mmarkus@redhat.com</a>&gt;<br>

&gt;&gt; &gt;&gt;&gt; wrote:<br>

&gt;&gt; &gt;&gt;&gt; On Oct 3, 2014, at 9:30, Radim Vansa &lt;<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>&gt; wrote:<br>

&gt;&gt; &gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; Hi,<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; recently we had a discussion about what size() returns, but I&#39;ve<br>

&gt;&gt; &gt;&gt;&gt;&gt; realized there are more things that users would like to know. My<br>

&gt;&gt; &gt;&gt;&gt;&gt; question is whether you think that they would really appreciate it,<br>

&gt;&gt; &gt;&gt;&gt;&gt; or<br>

&gt;&gt; &gt;&gt;&gt;&gt; whether it&#39;s just my QA point of view where I sometimes compute the<br>

&gt;&gt; &gt;&gt;&gt;&gt; &#39;checksums&#39; of cache to see if I didn&#39;t lost anything.<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; There are those sizes:<br>

&gt;&gt; &gt;&gt;&gt;&gt; A) number of owned entries<br>

&gt;&gt; &gt;&gt;&gt;&gt; B) number of entries stored locally in memory<br>

&gt;&gt; &gt;&gt;&gt;&gt; C) number of entries stored in each local cache store<br>

&gt;&gt; &gt;&gt;&gt;&gt; D) number of entries stored in each shared cache store<br>

&gt;&gt; &gt;&gt;&gt;&gt; E) total number of entries in cache<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; So far, we can get<br>

&gt;&gt; &gt;&gt;&gt;&gt; B via withFlags(SKIP_CACHE_LOAD).size()<br>

&gt;&gt; &gt;&gt;&gt;&gt; (passivation ? B : 0) + firstNonZero(C, D) via size()<br>

&gt;&gt; &gt;&gt;&gt;&gt; E via distributed iterators / MR<br>

&gt;&gt; &gt;&gt;&gt;&gt; A via data container iteration + distribution manager query, but only<br>

&gt;&gt; &gt;&gt;&gt;&gt; without cache store<br>

&gt;&gt; &gt;&gt;&gt;&gt; C or D through<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; I think that it would go along with users&#39; expectations if size()<br>

&gt;&gt; &gt;&gt;&gt;&gt; returned E and for the rest we should have special methods on<br>

&gt;&gt; &gt;&gt;&gt;&gt; AdvancedCache. That would of course change the meaning of size(), but<br>

&gt;&gt; &gt;&gt;&gt;&gt; I&#39;d say that finally to something that has firm meaning.<br>

&gt;&gt; &gt;&gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt;&gt; WDYT?<br>

&gt;&gt; &gt;&gt;&gt; There was a lot of arguments in past whether size() and other methods<br>

&gt;&gt; &gt;&gt;&gt; that operate over all the elements (keySet, values) are useful because:<br>

&gt;&gt; &gt;&gt;&gt; - they are approximate (data changes during iteration)<br>

&gt;&gt; &gt;&gt;&gt; - they are very resource consuming and might be miss-used (this is the<br>

&gt;&gt; &gt;&gt;&gt; reason we chosen to use size() with its current local semantic)<br>

&gt;&gt; &gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt; These methods (size, keys, values) are useful for people and I think<br>

&gt;&gt; &gt;&gt;&gt; we were not wise to implement them only on top of the local data: this is<br>

&gt;&gt; &gt;&gt;&gt; like preferring efficiency over correctness. This also created a lot of<br>

&gt;&gt; &gt;&gt;&gt; confusion with our users, question like size() doesn&#39;t return the correct<br>

&gt;&gt; &gt;&gt;&gt; value being asked regularly. I totally agree that size() returns E (i.e.<br>

&gt;&gt; &gt;&gt;&gt; everything that is stored within the grid, including persistence) and it&#39;s<br>

&gt;&gt; &gt;&gt;&gt; performance implications to be documented accordingly. For keySet and values<br>

&gt;&gt; &gt;&gt;&gt; - we should stop implementing them (throw exception) and point users to<br>

&gt;&gt; &gt;&gt;&gt; Will&#39;s distributed iterator which is a nicer way to achieve the desired<br>

&gt;&gt; &gt;&gt;&gt; behavior.<br>

&gt;&gt; &gt;&gt;&gt;<br>

&gt;&gt; &gt;&gt;&gt; We can also implement keySet() and values() on top of the distributed<br>

&gt;&gt; &gt;&gt;&gt; entry iterator and document that using the iterator directly is better.<br>

&gt;&gt; &gt;&gt; Yes, that&#39;s what I meant as well.<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; Cheers,<br>

&gt;&gt; &gt;&gt; --<br>

&gt;&gt; &gt;&gt; Mircea Markus<br>

&gt;&gt; &gt;&gt; Infinispan lead (<a href="http://www.infinispan.org" target="_blank">www.infinispan.org</a>)<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt;<br>

&gt;&gt; &gt;&gt; _______________________________________________<br>

&gt;&gt; &gt;&gt; infinispan-dev mailing list<br>

&gt;&gt; &gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt; &gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt; &gt; _______________________________________________<br>

&gt;&gt; &gt; infinispan-dev mailing list<br>

&gt;&gt; &gt; <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt; &gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; --<br>

&gt;&gt; Radim Vansa &lt;<a href="mailto:rvansa@redhat.com">rvansa@redhat.com</a>&gt;<br>

&gt;&gt; JBoss DataGrid QA<br>

&gt;&gt;<br>

&gt;&gt; _______________________________________________<br>

&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; infinispan-dev mailing list<br>

&gt; <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

_______________________________________________<br>

infinispan-dev mailing list<br>

<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

</div></div></blockquote></div><br></div></div>