<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 7, 2014 at 4:17 PM, William Burns <span dir="ltr">&lt;<a href="mailto:mudokonman@gmail.com" target="_blank">mudokonman@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On Tue, Oct 7, 2014 at 8:43 AM, Radim Vansa &lt;<a href="mailto:rvansa@redhat.com" target="_blank">rvansa@redhat.com</a>&gt; wrote:<br>

&gt; On 10/07/2014 02:21 PM, William Burns wrote:<br>

&gt;&gt; On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa &lt;<a href="mailto:rvansa@redhat.com" target="_blank">rvansa@redhat.com</a>&gt; wrote:<br>

&gt;&gt;&gt; If you have one local and one shared cache store, how should the command<br>

&gt;&gt;&gt; behave?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP,<br>

&gt;&gt;&gt; SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there&#39;s no<br>

&gt;&gt;&gt; SKIP_BACKUP_ENTRIES flag right now), where this method returns<br>

&gt;&gt;&gt; localStore.size() for first non-shared cache store + passivation ?<br>

&gt;&gt;&gt; dataContainer.size(SKIP_BACKUP_ENTRIES) : 0)<br>

&gt;&gt; Calling the size method in either distexec or MR will give you<br>

&gt;&gt; inflated numbers as you need to pay attention to the numOwners to get<br>

&gt;&gt; a proper count.<br>

&gt;<br>

&gt; That&#39;s what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be<br>

&gt; able to report only primary-owned entries, or we have to iterate and<br>

&gt; apply the filtering outside.<br>

<br>

</span>If we added this functionality then yes it would be promoted up to MR<br>

counting entries status though it would still have issues with rehash.<br>

As well as issues with concurrent activations and passivations.<br></blockquote><div><br></div><div>I think we can use something like OutdatedTopologyException to make sure we count each segment once, on the primary owner. But in order to verify that a particular node is the primary owner we&#39;d have to load each cache store entry, so performance with cache stores will be pretty bad.</div><div><br></div><div>Dealing with concurrent activations/passivations will is even trickier.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span><br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;&gt; b) distexec/MR sum of sharedStore.size() + passivation ? sum of<br>

&gt;&gt;&gt; dataContainer.size(SKIP_BACKUP_ENTRIES) : 0<br>

&gt;&gt; Calling the size on a shared cache actually should work somewhat well<br>

&gt;&gt; (assuming all entries are stored in the shared cache).  The problem is<br>

&gt;&gt; if passivation is enabled as you point out because you also have to<br>

&gt;&gt; check the data container which means you can also have an issue with<br>

&gt;&gt; concurrent activations and passivations (which you can&#39;t verify<br>

&gt;&gt; properly in either case without knowing the keys).<br>

&gt;&gt;<br>

&gt;&gt;&gt; c) MR that would count the entries<br>

&gt;&gt; This is the only reliable way to do this with MR.  And unfortunately<br>

&gt;&gt; if a rehash occurs I am not sure if you would get inconsistent numbers<br>

&gt;&gt; or an Exception.  In the latter at least you should be able to make<br>

&gt;&gt; sure that you have the proper number when it does return without<br>

&gt;&gt; exception.  I can&#39;t say how it works with multiple loaders though, my<br>

&gt;&gt; guess is that it may process the entry more than once so it depends on<br>

&gt;&gt; if your mapper is smart enough to realize it.<br>

&gt;<br>

&gt; I don&#39;t think that reporting incorrect size is *that* harmful - even<br>

&gt; ConcurrentMap interface says that it&#39;s just a wild guess and when things<br>

&gt; are changing, you can&#39;t rely on that.<br>

<br>

</span>ConcurrentMap doesn&#39;t say anything about size method actually.<br>

ConcurrentHashMap has some verbage about saying that it might not be<br>

completely correct under concurrent modification though.<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

It isn&#39;t a wild guess really though for ConcurrentHashMap.  The worst<br>

is that you could count a value that was there but it is now removed<br>

or you don&#39;t count a value that was recently added.  Really the<br>

guarantee from CHM is that it counts each individual segment properly<br>

for a glimpse of time for that segment, the problem is that each<br>

segment could change (since they are counted at different times).  But<br>

the values missing in ConcurrentHashMap are totally different than<br>

losing an entire segment due to a rehash.  You could theoretically<br>

have a rehash occur right after MR started iterating and see no values<br>

for that segment or a very small subset.  There is a much larger<br>

margin of error in this case for what values are seen and which are<br>

not.<br>

<span><br></span></blockquote><div><br></div><div><div>Interesting... the Map javadoc seems to assume linearizability, maybe because the original implementation was Hashtable :)</div><div><br></div><div>So there is precedent for relaxing the definition of size(). But of course some users will still expect a 0 error margin when there are no concurrent writes, so I agree we don&#39;t get a free pass to ignore rehashes and activations during get().</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;&gt; d) wrapper on distributed entry iteration with converters set to return<br>

&gt;&gt;&gt; 0-sized entries<br>

&gt;&gt; Entry iterator can&#39;t return 0 sized entries (just the values).  The<br>

&gt;&gt; keys are required to make sure that the count is correct and also to<br>

&gt;&gt; ensure that if a rehash happens in the middle it can properly continue<br>

&gt;&gt; to operate without having to start over.  Entry iterator should work<br>

&gt;&gt; properly irrespective of the number of stores/loaders that are<br>

&gt;&gt; configured, since it keep track of already seen keys (so duplicates<br>

&gt;&gt; are ignored).<br>

&gt;<br>

&gt; Ok, I was simplifying that a bit. And by the way, I don&#39;t really like<br>

&gt; the fact that for distributed entry iteration you need to be able to<br>

&gt; keep all keys from one segment at one moment in memory. But fine -<br>

&gt; distributed entry iteration is probably not the right way.<br>

<br>

</span>I agree it is annoying to have to keep the keys, but it is one of the<br>

few ways to reliably get all the values without losing one.  Actually<br>

this approach provides a much closer approximation to what<br>

ConcurrentHashMap provides for its size implementation, since it can&#39;t<br>

drop a segment.  It is pretty much required to do it this way to do<br>

keySet, entrySet, and values where you don&#39;t have the luxury of<br>

dropping whole swaths of entries like you do with calling size()<br>

method (even if the value(s) was there the entire time).<br></blockquote><div><br></div><div>If we decide to improve size(), I&#39;d vote to use distributed entry iterators.</div><div><br></div><div>We may be able to avoid sending all the keys to the originator when the cache doesn&#39;t have any stores. But with a store it looks like we can&#39;t avoid reading all the keys from the store, so skipping the transfer of the keys wouldn&#39;t help that much.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span><br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;&gt; And what about nodes with different configuration?<br>

&gt;&gt; Hard to know without knowing what the differences are.<br>

&gt;<br>

&gt; I had in my mind different loaders and passivation configuration (e.g.<br>

&gt; some node could use shared store and some don&#39;t - do we want to handle<br>

&gt; such obscure configs? Can we design that without the need to have<br>

&gt; complicated decision trees what to include and what not?).<br>

<br>

</span>Well the last sentence means we have to use MR or Entry Iterator since<br>

we can&#39;t call size on the shared loader.  I would think that it should<br>

still work irrespective of the loader configuration (except for MR<br>

with multiple loaders).  The main issue I can think of is that if<br>

everyone isn&#39;t using the shared loader that you could have stale<br>

values in the loader if you don&#39;t always have a node using the shared<br>

loader up (assuming purge at startup isn&#39;t enabled).<br></blockquote><div><br></div><div>We really shouldn&#39;t support different store/loader configurations on each node, except for minor stuff like paths.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div><div><br>

&gt;<br>

&gt; Radim<br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt;&gt; Radim<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On 10/06/2014 01:57 PM, Sanne Grinovero wrote:<br>

&gt;&gt;&gt;&gt; On 6 October 2014 12:44, Tristan Tarrant &lt;<a href="mailto:ttarrant@redhat.com" target="_blank">ttarrant@redhat.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; I think we should provide correct implementations of size() (and others)<br>

&gt;&gt;&gt;&gt;&gt; and provide shortcut implementations using our usual Flag API (e.g.<br>

&gt;&gt;&gt;&gt;&gt; SKIP_REMOTE_LOOKUP).<br>

&gt;&gt;&gt;&gt; Right that would be very nice. Same for CacheStore interaction: all<br>

&gt;&gt;&gt;&gt; cachestores should be included unless skipped explicitly.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Sanne<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Tristan<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On 06/10/14 12:57, Sanne Grinovero wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt; On 3 October 2014 18:38, Dennis Reed &lt;<a href="mailto:dereed@redhat.com" target="_blank">dereed@redhat.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Since size() is defined by the ConcurrentMap interface, it already has a<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; precisely defined meaning.  The only &quot;correct&quot; implementation is E.<br>

&gt;&gt;&gt;&gt;&gt;&gt; +1<br>

&gt;&gt; This is one of the things I have been wanting to do is actually<br>

&gt;&gt; implement the other Map methods across the entire cache.  However to<br>

&gt;&gt; do a lot of these in a memory conscious way they would need to be ran<br>

&gt;&gt; ignoring any ongoing transactions.  Actually having this requirement<br>

&gt;&gt; allows these methods to be implemented quite easily especially in<br>

&gt;&gt; conjunction with the EntryIterator.  I almost made a PR for it a while<br>

&gt;&gt; back, but it seemed a little zealous to do at the same time and it<br>

&gt;&gt; didn&#39;t seem that people were pushing for it very hard (maybe that was<br>

&gt;&gt; a wrong assumption).  Also I wasn&#39;t quite sure the transactional part<br>

&gt;&gt; not being functional anymore would be a deterrent.<br>

&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; The current non-correct implementation was just because it&#39;s expensive<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; to calculate correctly.  I&#39;m not sure the current impl is really that<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; useful for anything.<br>

&gt;&gt;&gt;&gt;&gt;&gt; +1<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; And not just size() but many others from ConcurrentMap.<br>

&gt;&gt;&gt;&gt;&gt;&gt; The question is if we should drop the interface and all the methods<br>

&gt;&gt;&gt;&gt;&gt;&gt; which aren&#39;t efficiently implementable, or fix all those methods.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; In the past I loved that I could inject &quot;Infinispan superpowers&quot; into<br>

&gt;&gt;&gt;&gt;&gt;&gt; an application making extensive use of Map and ConcurrentMap without<br>

&gt;&gt;&gt;&gt;&gt;&gt; changes, but that has been deceiving and required great care such as<br>

&gt;&gt;&gt;&gt;&gt;&gt; verifying that these features would not be used anywhere in the code.<br>

&gt;&gt;&gt;&gt;&gt;&gt; I ended up wrapping the Cache implementation in a custom adapter which<br>

&gt;&gt;&gt;&gt;&gt;&gt; would also implement ConcurrentMap but would throw a RuntimeException<br>

&gt;&gt;&gt;&gt;&gt;&gt; if any of the &quot;unallowed&quot; methods was called, at least I would detect<br>

&gt;&gt;&gt;&gt;&gt;&gt; violations safely.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; I still think that for the time being - until a better solution is<br>

&gt;&gt;&gt;&gt;&gt;&gt; planned - we should throw exceptions.. alas that&#39;s an old conversation<br>

&gt;&gt;&gt;&gt;&gt;&gt; and it was never done.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Sanne<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; -Dennis<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 10/03/2014 03:30 AM, Radim Vansa wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; recently we had a discussion about what size() returns, but I&#39;ve<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; realized there are more things that users would like to know. My<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; question is whether you think that they would really appreciate it, or<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; whether it&#39;s just my QA point of view where I sometimes compute the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &#39;checksums&#39; of cache to see if I didn&#39;t lost anything.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; There are those sizes:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; A) number of owned entries<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; B) number of entries stored locally in memory<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; C) number of entries stored in each local cache store<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; D) number of entries stored in each shared cache store<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; E) total number of entries in cache<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; So far, we can get<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; B via withFlags(SKIP_CACHE_LOAD).size()<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; (passivation ? B : 0) + firstNonZero(C, D) via size()<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; E via distributed iterators / MR<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; A via data container iteration + distribution manager query, but only<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; without cache store<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; C or D through<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I think that it would go along with users&#39; expectations if size()<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; returned E and for the rest we should have special methods on<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; AdvancedCache. That would of course change the meaning of size(), but<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I&#39;d say that finally to something that has firm meaning.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; WDYT?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Radim<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt;&gt;&gt;&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt;&gt;&gt;&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt;&gt;&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt;&gt;&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt;&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt;&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; --<br>

&gt;&gt;&gt; Radim Vansa &lt;<a href="mailto:rvansa@redhat.com" target="_blank">rvansa@redhat.com</a>&gt;<br>

&gt;&gt;&gt; JBoss DataGrid QA<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;&gt; _______________________________________________<br>

&gt;&gt; infinispan-dev mailing list<br>

&gt;&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt;&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Radim Vansa &lt;<a href="mailto:rvansa@redhat.com" target="_blank">rvansa@redhat.com</a>&gt;<br>

&gt; JBoss DataGrid QA<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; infinispan-dev mailing list<br>

&gt; <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

&gt; <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

_______________________________________________<br>

infinispan-dev mailing list<br>

<a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br>

<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

</div></div></blockquote></div><br></div></div>