Re: [infinispan-dev] About size()

Tuesday, 7 October 2014

Considering the frequency of "How do I get the number of entries in 
cache", "How do I get all keys" on all forums, I think that backing to 
runtime exception would not satisfy the users.

On 10/07/2014 03:16 PM, Sanne Grinovero wrote:
...
 Considering all these very valid concerns I'd return on my
proposal
 for throwing runtime exceptions via an (optional) decorator.

 I'd have such a decorator in place by default, so that we make it very
 clear that - while you can remove it - the behaviour of such methods
 is "unusual" and that a user would be better off avoiding them unless
 he's into the advanced stuff.

 As said before, that worked very well for me in the past and it was
 great that - even while I did know - I had a safety guard to highlight
 unintended refactorings by others on my team who didn't know the black
 art of using Infinispan correctly.

 Sanne

 On 7 October 2014 13:43, Radim Vansa <rvansa(a)redhat.com&gt; wrote:
> On 10/07/2014 02:21 PM, William Burns wrote:
>> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa <rvansa(a)redhat.com&gt; wrote:
>>> If you have one local and one shared cache store, how should the command
>>> behave?
>>>
>>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP,
>>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no
>>> SKIP_BACKUP_ENTRIES flag right now), where this method returns
>>> localStore.size() for first non-shared cache store + passivation ?
>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0)
>> Calling the size method in either distexec or MR will give you
>> inflated numbers as you need to pay attention to the numOwners to get
>> a proper count.
> That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be
> able to report only primary-owned entries, or we have to iterate and
> apply the filtering outside.
>
>>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of
>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0
>> Calling the size on a shared cache actually should work somewhat well
>> (assuming all entries are stored in the shared cache).  The problem is
>> if passivation is enabled as you point out because you also have to
>> check the data container which means you can also have an issue with
>> concurrent activations and passivations (which you can't verify
>> properly in either case without knowing the keys).
>>
>>> c) MR that would count the entries
>> This is the only reliable way to do this with MR.  And unfortunately
>> if a rehash occurs I am not sure if you would get inconsistent numbers
>> or an Exception.  In the latter at least you should be able to make
>> sure that you have the proper number when it does return without
>> exception.  I can't say how it works with multiple loaders though, my
>> guess is that it may process the entry more than once so it depends on
>> if your mapper is smart enough to realize it.
> I don't think that reporting incorrect size is *that* harmful - even
> ConcurrentMap interface says that it's just a wild guess and when things
> are changing, you can't rely on that.
>
>>> d) wrapper on distributed entry iteration with converters set to return
>>> 0-sized entries
>> Entry iterator can't return 0 sized entries (just the values).  The
>> keys are required to make sure that the count is correct and also to
>> ensure that if a rehash happens in the middle it can properly continue
>> to operate without having to start over.  Entry iterator should work
>> properly irrespective of the number of stores/loaders that are
>> configured, since it keep track of already seen keys (so duplicates
>> are ignored).
> Ok, I was simplifying that a bit. And by the way, I don't really like
> the fact that for distributed entry iteration you need to be able to
> keep all keys from one segment at one moment in memory. But fine -
> distributed entry iteration is probably not the right way.
>
>>
>>> And what about nodes with different configuration?
>> Hard to know without knowing what the differences are.
> I had in my mind different loaders and passivation configuration (e.g.
> some node could use shared store and some don't - do we want to handle
> such obscure configs? Can we design that without the need to have
> complicated decision trees what to include and what not?).
>
> Radim
>
>>> Radim
>>>
>>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote:
>>>> On 6 October 2014 12:44, Tristan Tarrant <ttarrant(a)redhat.com&gt;
wrote:
>>>>> I think we should provide correct implementations of size() (and
others)
>>>>> and provide shortcut implementations using our usual Flag API (e.g.
>>>>> SKIP_REMOTE_LOOKUP).
>>>> Right that would be very nice. Same for CacheStore interaction: all
>>>> cachestores should be included unless skipped explicitly.
>>>>
>>>> Sanne
>>>>
>>>>> Tristan
>>>>>
>>>>> On 06/10/14 12:57, Sanne Grinovero wrote:
>>>>>> On 3 October 2014 18:38, Dennis Reed <dereed(a)redhat.com&gt;
wrote:
>>>>>>> Since size() is defined by the ConcurrentMap interface, it
already has a
>>>>>>> precisely defined meaning.  The only "correct"
implementation is E.
>>>>>> +1
>> This is one of the things I have been wanting to do is actually
>> implement the other Map methods across the entire cache.  However to
>> do a lot of these in a memory conscious way they would need to be ran
>> ignoring any ongoing transactions.  Actually having this requirement
>> allows these methods to be implemented quite easily especially in
>> conjunction with the EntryIterator.  I almost made a PR for it a while
>> back, but it seemed a little zealous to do at the same time and it
>> didn't seem that people were pushing for it very hard (maybe that was
>> a wrong assumption).  Also I wasn't quite sure the transactional part
>> not being functional anymore would be a deterrent.
>>
>>>>>>> The current non-correct implementation was just because
it's expensive
>>>>>>> to calculate correctly.  I'm not sure the current impl is
really that
>>>>>>> useful for anything.
>>>>>> +1
>>>>>>
>>>>>> And not just size() but many others from ConcurrentMap.
>>>>>> The question is if we should drop the interface and all the
methods
>>>>>> which aren't efficiently implementable, or fix all those
methods.
>>>>>>
>>>>>> In the past I loved that I could inject "Infinispan
superpowers" into
>>>>>> an application making extensive use of Map and ConcurrentMap
without
>>>>>> changes, but that has been deceiving and required great care such
as
>>>>>> verifying that these features would not be used anywhere in the
code.
>>>>>> I ended up wrapping the Cache implementation in a custom adapter
which
>>>>>> would also implement ConcurrentMap but would throw a
RuntimeException
>>>>>> if any of the "unallowed" methods was called, at least
I would detect
>>>>>> violations safely.
>>>>>>
>>>>>> I still think that for the time being - until a better solution
is
>>>>>> planned - we should throw exceptions.. alas that's an old
conversation
>>>>>> and it was never done.
>>>>>>
>>>>>> Sanne
>>>>>>
>>>>>>
>>>>>>> -Dennis
>>>>>>>
>>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> recently we had a discussion about what size() returns,
but I've
>>>>>>>> realized there are more things that users would like to
know. My
>>>>>>>> question is whether you think that they would really
appreciate it, or
>>>>>>>> whether it's just my QA point of view where I
sometimes compute the
>>>>>>>> 'checksums' of cache to see if I didn't lost
anything.
>>>>>>>>
>>>>>>>> There are those sizes:
>>>>>>>> A) number of owned entries
>>>>>>>> B) number of entries stored locally in memory
>>>>>>>> C) number of entries stored in each local cache store
>>>>>>>> D) number of entries stored in each shared cache store
>>>>>>>> E) total number of entries in cache
>>>>>>>>
>>>>>>>> So far, we can get
>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>>>>> E via distributed iterators / MR
>>>>>>>> A via data container iteration + distribution manager
query, but only
>>>>>>>> without cache store
>>>>>>>> C or D through
>>>>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>>
>>>>>>>> I think that it would go along with users'
expectations if size()
>>>>>>>> returned E and for the rest we should have special
methods on
>>>>>>>> AdvancedCache. That would of course change the meaning of
size(), but
>>>>>>>> I'd say that finally to something that has firm
meaning.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> Radim
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> --
>>> Radim Vansa <rvansa(a)redhat.com&gt;
>>> JBoss DataGrid QA
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Radim Vansa <rvansa(a)redhat.com&gt;
> JBoss DataGrid QA
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

-- 
Radim Vansa <rvansa(a)redhat.com&gt;
JBoss DataGrid QA

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] About size()