On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmarkus@redhat.com> wrote:

On Oct 3, 2014, at 9:30, Radim Vansa <rvansa@redhat.com> wrote:

> Hi,
>
> recently we had a discussion about what size() returns, but I've
> realized there are more things that users would like to know. My
> question is whether you think that they would really appreciate it, or
> whether it's just my QA point of view where I sometimes compute the
> 'checksums' of cache to see if I didn't lost anything.
>
> There are those sizes:
> A) number of owned entries
> B) number of entries stored locally in memory
> C) number of entries stored in each local cache store
> D) number of entries stored in each shared cache store
> E) total number of entries in cache
>
> So far, we can get
> B via withFlags(SKIP_CACHE_LOAD).size()
> (passivation ? B : 0) + firstNonZero(C, D) via size()
> E via distributed iterators / MR
> A via data container iteration + distribution manager query, but only
> without cache store
> C or D through
> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>
> I think that it would go along with users' expectations if size()
> returned E and for the rest we should have special methods on
> AdvancedCache. That would of course change the meaning of size(), but
> I'd say that finally to something that has firm meaning.
>
> WDYT?

There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because:
- they are approximate (data changes during iteration)
- they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic)

These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior.

We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better.

>
> Radim
>
> --
> Radim Vansa <rvansa@redhat.com>
> JBoss DataGrid QA
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev