[infinispan-dev] About size()

Mircea Markus mmarkus at redhat.com
Wed Oct 8 10:03:13 EDT 2014


On Oct 3, 2014, at 9:30, Radim Vansa <rvansa at redhat.com> wrote:

> Hi,
> 
> recently we had a discussion about what size() returns, but I've 
> realized there are more things that users would like to know. My 
> question is whether you think that they would really appreciate it, or 
> whether it's just my QA point of view where I sometimes compute the 
> 'checksums' of cache to see if I didn't lost anything.
> 
> There are those sizes:
> A) number of owned entries
> B) number of entries stored locally in memory
> C) number of entries stored in each local cache store
> D) number of entries stored in each shared cache store
> E) total number of entries in cache
> 
> So far, we can get
> B via withFlags(SKIP_CACHE_LOAD).size()
> (passivation ? B : 0) + firstNonZero(C, D) via size()
> E via distributed iterators / MR
> A via data container iteration + distribution manager query, but only 
> without cache store
> C or D through 
> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
> 
> I think that it would go along with users' expectations if size() 
> returned E and for the rest we should have special methods on 
> AdvancedCache. That would of course change the meaning of size(), but 
> I'd say that finally to something that has firm meaning.
> 
> WDYT?

There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because:
- they are approximate (data changes during iteration)
- they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic)

These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. 


> 
> Radim
> 
> -- 
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)







More information about the infinispan-dev mailing list