[infinispan-dev] About size()

Radim Vansa rvansa at redhat.com
Tue Oct 7 09:42:10 EDT 2014


Considering the frequency of "How do I get the number of entries in 
cache", "How do I get all keys" on all forums, I think that backing to 
runtime exception would not satisfy the users.

On 10/07/2014 03:16 PM, Sanne Grinovero wrote:
> Considering all these very valid concerns I'd return on my proposal
> for throwing runtime exceptions via an (optional) decorator.
>
> I'd have such a decorator in place by default, so that we make it very
> clear that - while you can remove it - the behaviour of such methods
> is "unusual" and that a user would be better off avoiding them unless
> he's into the advanced stuff.
>
> As said before, that worked very well for me in the past and it was
> great that - even while I did know - I had a safety guard to highlight
> unintended refactorings by others on my team who didn't know the black
> art of using Infinispan correctly.
>
> Sanne
>
>
>
> On 7 October 2014 13:43, Radim Vansa <rvansa at redhat.com> wrote:
>> On 10/07/2014 02:21 PM, William Burns wrote:
>>> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>> If you have one local and one shared cache store, how should the command
>>>> behave?
>>>>
>>>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP,
>>>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no
>>>> SKIP_BACKUP_ENTRIES flag right now), where this method returns
>>>> localStore.size() for first non-shared cache store + passivation ?
>>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0)
>>> Calling the size method in either distexec or MR will give you
>>> inflated numbers as you need to pay attention to the numOwners to get
>>> a proper count.
>> That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be
>> able to report only primary-owned entries, or we have to iterate and
>> apply the filtering outside.
>>
>>>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of
>>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0
>>> Calling the size on a shared cache actually should work somewhat well
>>> (assuming all entries are stored in the shared cache).  The problem is
>>> if passivation is enabled as you point out because you also have to
>>> check the data container which means you can also have an issue with
>>> concurrent activations and passivations (which you can't verify
>>> properly in either case without knowing the keys).
>>>
>>>> c) MR that would count the entries
>>> This is the only reliable way to do this with MR.  And unfortunately
>>> if a rehash occurs I am not sure if you would get inconsistent numbers
>>> or an Exception.  In the latter at least you should be able to make
>>> sure that you have the proper number when it does return without
>>> exception.  I can't say how it works with multiple loaders though, my
>>> guess is that it may process the entry more than once so it depends on
>>> if your mapper is smart enough to realize it.
>> I don't think that reporting incorrect size is *that* harmful - even
>> ConcurrentMap interface says that it's just a wild guess and when things
>> are changing, you can't rely on that.
>>
>>>> d) wrapper on distributed entry iteration with converters set to return
>>>> 0-sized entries
>>> Entry iterator can't return 0 sized entries (just the values).  The
>>> keys are required to make sure that the count is correct and also to
>>> ensure that if a rehash happens in the middle it can properly continue
>>> to operate without having to start over.  Entry iterator should work
>>> properly irrespective of the number of stores/loaders that are
>>> configured, since it keep track of already seen keys (so duplicates
>>> are ignored).
>> Ok, I was simplifying that a bit. And by the way, I don't really like
>> the fact that for distributed entry iteration you need to be able to
>> keep all keys from one segment at one moment in memory. But fine -
>> distributed entry iteration is probably not the right way.
>>
>>>
>>>> And what about nodes with different configuration?
>>> Hard to know without knowing what the differences are.
>> I had in my mind different loaders and passivation configuration (e.g.
>> some node could use shared store and some don't - do we want to handle
>> such obscure configs? Can we design that without the need to have
>> complicated decision trees what to include and what not?).
>>
>> Radim
>>
>>>> Radim
>>>>
>>>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote:
>>>>> On 6 October 2014 12:44, Tristan Tarrant <ttarrant at redhat.com> wrote:
>>>>>> I think we should provide correct implementations of size() (and others)
>>>>>> and provide shortcut implementations using our usual Flag API (e.g.
>>>>>> SKIP_REMOTE_LOOKUP).
>>>>> Right that would be very nice. Same for CacheStore interaction: all
>>>>> cachestores should be included unless skipped explicitly.
>>>>>
>>>>> Sanne
>>>>>
>>>>>> Tristan
>>>>>>
>>>>>> On 06/10/14 12:57, Sanne Grinovero wrote:
>>>>>>> On 3 October 2014 18:38, Dennis Reed <dereed at redhat.com> wrote:
>>>>>>>> Since size() is defined by the ConcurrentMap interface, it already has a
>>>>>>>> precisely defined meaning.  The only "correct" implementation is E.
>>>>>>> +1
>>> This is one of the things I have been wanting to do is actually
>>> implement the other Map methods across the entire cache.  However to
>>> do a lot of these in a memory conscious way they would need to be ran
>>> ignoring any ongoing transactions.  Actually having this requirement
>>> allows these methods to be implemented quite easily especially in
>>> conjunction with the EntryIterator.  I almost made a PR for it a while
>>> back, but it seemed a little zealous to do at the same time and it
>>> didn't seem that people were pushing for it very hard (maybe that was
>>> a wrong assumption).  Also I wasn't quite sure the transactional part
>>> not being functional anymore would be a deterrent.
>>>
>>>>>>>> The current non-correct implementation was just because it's expensive
>>>>>>>> to calculate correctly.  I'm not sure the current impl is really that
>>>>>>>> useful for anything.
>>>>>>> +1
>>>>>>>
>>>>>>> And not just size() but many others from ConcurrentMap.
>>>>>>> The question is if we should drop the interface and all the methods
>>>>>>> which aren't efficiently implementable, or fix all those methods.
>>>>>>>
>>>>>>> In the past I loved that I could inject "Infinispan superpowers" into
>>>>>>> an application making extensive use of Map and ConcurrentMap without
>>>>>>> changes, but that has been deceiving and required great care such as
>>>>>>> verifying that these features would not be used anywhere in the code.
>>>>>>> I ended up wrapping the Cache implementation in a custom adapter which
>>>>>>> would also implement ConcurrentMap but would throw a RuntimeException
>>>>>>> if any of the "unallowed" methods was called, at least I would detect
>>>>>>> violations safely.
>>>>>>>
>>>>>>> I still think that for the time being - until a better solution is
>>>>>>> planned - we should throw exceptions.. alas that's an old conversation
>>>>>>> and it was never done.
>>>>>>>
>>>>>>> Sanne
>>>>>>>
>>>>>>>
>>>>>>>> -Dennis
>>>>>>>>
>>>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> recently we had a discussion about what size() returns, but I've
>>>>>>>>> realized there are more things that users would like to know. My
>>>>>>>>> question is whether you think that they would really appreciate it, or
>>>>>>>>> whether it's just my QA point of view where I sometimes compute the
>>>>>>>>> 'checksums' of cache to see if I didn't lost anything.
>>>>>>>>>
>>>>>>>>> There are those sizes:
>>>>>>>>> A) number of owned entries
>>>>>>>>> B) number of entries stored locally in memory
>>>>>>>>> C) number of entries stored in each local cache store
>>>>>>>>> D) number of entries stored in each shared cache store
>>>>>>>>> E) total number of entries in cache
>>>>>>>>>
>>>>>>>>> So far, we can get
>>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>>>>>> E via distributed iterators / MR
>>>>>>>>> A via data container iteration + distribution manager query, but only
>>>>>>>>> without cache store
>>>>>>>>> C or D through
>>>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>>>
>>>>>>>>> I think that it would go along with users' expectations if size()
>>>>>>>>> returned E and for the rest we should have special methods on
>>>>>>>>> AdvancedCache. That would of course change the meaning of size(), but
>>>>>>>>> I'd say that finally to something that has firm meaning.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> Radim
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> --
>>>> Radim Vansa <rvansa at redhat.com>
>>>> JBoss DataGrid QA
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Radim Vansa <rvansa at redhat.com>
>> JBoss DataGrid QA
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA



More information about the infinispan-dev mailing list