On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa <rvansa(a)redhat.com> wrote:
If you have one local and one shared cache store, how should the
command
behave?
a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP,
SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no
SKIP_BACKUP_ENTRIES flag right now), where this method returns
localStore.size() for first non-shared cache store + passivation ?
dataContainer.size(SKIP_BACKUP_ENTRIES) : 0)
Calling the size method in either distexec or MR will give you
inflated numbers as you need to pay attention to the numOwners to get
a proper count.
b) distexec/MR sum of sharedStore.size() + passivation ? sum of
dataContainer.size(SKIP_BACKUP_ENTRIES) : 0
Calling the size on a shared cache actually should work somewhat well
(assuming all entries are stored in the shared cache). The problem is
if passivation is enabled as you point out because you also have to
check the data container which means you can also have an issue with
concurrent activations and passivations (which you can't verify
properly in either case without knowing the keys).
c) MR that would count the entries
This is the only reliable way to do this with MR. And unfortunately
if a rehash occurs I am not sure if you would get inconsistent numbers
or an Exception. In the latter at least you should be able to make
sure that you have the proper number when it does return without
exception. I can't say how it works with multiple loaders though, my
guess is that it may process the entry more than once so it depends on
if your mapper is smart enough to realize it.
d) wrapper on distributed entry iteration with converters set to
return
0-sized entries
Entry iterator can't return 0 sized entries (just the values). The
keys are required to make sure that the count is correct and also to
ensure that if a rehash happens in the middle it can properly continue
to operate without having to start over. Entry iterator should work
properly irrespective of the number of stores/loaders that are
configured, since it keep track of already seen keys (so duplicates
are ignored).
And what about nodes with different configuration?
Hard to know without knowing what the differences are.
Radim
On 10/06/2014 01:57 PM, Sanne Grinovero wrote:
> On 6 October 2014 12:44, Tristan Tarrant <ttarrant(a)redhat.com> wrote:
>> I think we should provide correct implementations of size() (and others)
>> and provide shortcut implementations using our usual Flag API (e.g.
>> SKIP_REMOTE_LOOKUP).
> Right that would be very nice. Same for CacheStore interaction: all
> cachestores should be included unless skipped explicitly.
>
> Sanne
>
>> Tristan
>>
>> On 06/10/14 12:57, Sanne Grinovero wrote:
>>> On 3 October 2014 18:38, Dennis Reed <dereed(a)redhat.com> wrote:
>>>> Since size() is defined by the ConcurrentMap interface, it already has a
>>>> precisely defined meaning. The only "correct" implementation
is E.
>>> +1
This is one of the things I have been wanting to do is actually
implement the other Map methods across the entire cache. However to
do a lot of these in a memory conscious way they would need to be ran
ignoring any ongoing transactions. Actually having this requirement
allows these methods to be implemented quite easily especially in
conjunction with the EntryIterator. I almost made a PR for it a while
back, but it seemed a little zealous to do at the same time and it
didn't seem that people were pushing for it very hard (maybe that was
a wrong assumption). Also I wasn't quite sure the transactional part
not being functional anymore would be a deterrent.
>>>
>>>> The current non-correct implementation was just because it's
expensive
>>>> to calculate correctly. I'm not sure the current impl is really
that
>>>> useful for anything.
>>> +1
>>>
>>> And not just size() but many others from ConcurrentMap.
>>> The question is if we should drop the interface and all the methods
>>> which aren't efficiently implementable, or fix all those methods.
>>>
>>> In the past I loved that I could inject "Infinispan superpowers"
into
>>> an application making extensive use of Map and ConcurrentMap without
>>> changes, but that has been deceiving and required great care such as
>>> verifying that these features would not be used anywhere in the code.
>>> I ended up wrapping the Cache implementation in a custom adapter which
>>> would also implement ConcurrentMap but would throw a RuntimeException
>>> if any of the "unallowed" methods was called, at least I would
detect
>>> violations safely.
>>>
>>> I still think that for the time being - until a better solution is
>>> planned - we should throw exceptions.. alas that's an old conversation
>>> and it was never done.
>>>
>>> Sanne
>>>
>>>
>>>> -Dennis
>>>>
>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote:
>>>>> Hi,
>>>>>
>>>>> recently we had a discussion about what size() returns, but I've
>>>>> realized there are more things that users would like to know. My
>>>>> question is whether you think that they would really appreciate it,
or
>>>>> whether it's just my QA point of view where I sometimes compute
the
>>>>> 'checksums' of cache to see if I didn't lost anything.
>>>>>
>>>>> There are those sizes:
>>>>> A) number of owned entries
>>>>> B) number of entries stored locally in memory
>>>>> C) number of entries stored in each local cache store
>>>>> D) number of entries stored in each shared cache store
>>>>> E) total number of entries in cache
>>>>>
>>>>> So far, we can get
>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>> E via distributed iterators / MR
>>>>> A via data container iteration + distribution manager query, but
only
>>>>> without cache store
>>>>> C or D through
>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>
>>>>> I think that it would go along with users' expectations if
size()
>>>>> returned E and for the rest we should have special methods on
>>>>> AdvancedCache. That would of course change the meaning of size(),
but
>>>>> I'd say that finally to something that has firm meaning.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Radim
>>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa(a)redhat.com>
JBoss DataGrid QA
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev