Considering all these very valid concerns I'd return on my proposal
for throwing runtime exceptions via an (optional) decorator.
I'd have such a decorator in place by default, so that we make it very
clear that - while you can remove it - the behaviour of such methods
is "unusual" and that a user would be better off avoiding them unless
he's into the advanced stuff.
As said before, that worked very well for me in the past and it was
great that - even while I did know - I had a safety guard to highlight
unintended refactorings by others on my team who didn't know the black
art of using Infinispan correctly.
Sanne
On 7 October 2014 13:43, Radim Vansa <rvansa(a)redhat.com> wrote:
On 10/07/2014 02:21 PM, William Burns wrote:
> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa <rvansa(a)redhat.com> wrote:
>> If you have one local and one shared cache store, how should the command
>> behave?
>>
>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP,
>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no
>> SKIP_BACKUP_ENTRIES flag right now), where this method returns
>> localStore.size() for first non-shared cache store + passivation ?
>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0)
> Calling the size method in either distexec or MR will give you
> inflated numbers as you need to pay attention to the numOwners to get
> a proper count.
That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be
able to report only primary-owned entries, or we have to iterate and
apply the filtering outside.
>
>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of
>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0
> Calling the size on a shared cache actually should work somewhat well
> (assuming all entries are stored in the shared cache). The problem is
> if passivation is enabled as you point out because you also have to
> check the data container which means you can also have an issue with
> concurrent activations and passivations (which you can't verify
> properly in either case without knowing the keys).
>
>> c) MR that would count the entries
> This is the only reliable way to do this with MR. And unfortunately
> if a rehash occurs I am not sure if you would get inconsistent numbers
> or an Exception. In the latter at least you should be able to make
> sure that you have the proper number when it does return without
> exception. I can't say how it works with multiple loaders though, my
> guess is that it may process the entry more than once so it depends on
> if your mapper is smart enough to realize it.
I don't think that reporting incorrect size is *that* harmful - even
ConcurrentMap interface says that it's just a wild guess and when things
are changing, you can't rely on that.
>
>> d) wrapper on distributed entry iteration with converters set to return
>> 0-sized entries
> Entry iterator can't return 0 sized entries (just the values). The
> keys are required to make sure that the count is correct and also to
> ensure that if a rehash happens in the middle it can properly continue
> to operate without having to start over. Entry iterator should work
> properly irrespective of the number of stores/loaders that are
> configured, since it keep track of already seen keys (so duplicates
> are ignored).
Ok, I was simplifying that a bit. And by the way, I don't really like
the fact that for distributed entry iteration you need to be able to
keep all keys from one segment at one moment in memory. But fine -
distributed entry iteration is probably not the right way.
>
>
>> And what about nodes with different configuration?
> Hard to know without knowing what the differences are.
I had in my mind different loaders and passivation configuration (e.g.
some node could use shared store and some don't - do we want to handle
such obscure configs? Can we design that without the need to have
complicated decision trees what to include and what not?).
Radim
>
>> Radim
>>
>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote:
>>> On 6 October 2014 12:44, Tristan Tarrant <ttarrant(a)redhat.com> wrote:
>>>> I think we should provide correct implementations of size() (and others)
>>>> and provide shortcut implementations using our usual Flag API (e.g.
>>>> SKIP_REMOTE_LOOKUP).
>>> Right that would be very nice. Same for CacheStore interaction: all
>>> cachestores should be included unless skipped explicitly.
>>>
>>> Sanne
>>>
>>>> Tristan
>>>>
>>>> On 06/10/14 12:57, Sanne Grinovero wrote:
>>>>> On 3 October 2014 18:38, Dennis Reed <dereed(a)redhat.com>
wrote:
>>>>>> Since size() is defined by the ConcurrentMap interface, it
already has a
>>>>>> precisely defined meaning. The only "correct"
implementation is E.
>>>>> +1
> This is one of the things I have been wanting to do is actually
> implement the other Map methods across the entire cache. However to
> do a lot of these in a memory conscious way they would need to be ran
> ignoring any ongoing transactions. Actually having this requirement
> allows these methods to be implemented quite easily especially in
> conjunction with the EntryIterator. I almost made a PR for it a while
> back, but it seemed a little zealous to do at the same time and it
> didn't seem that people were pushing for it very hard (maybe that was
> a wrong assumption). Also I wasn't quite sure the transactional part
> not being functional anymore would be a deterrent.
>
>>>>>> The current non-correct implementation was just because it's
expensive
>>>>>> to calculate correctly. I'm not sure the current impl is
really that
>>>>>> useful for anything.
>>>>> +1
>>>>>
>>>>> And not just size() but many others from ConcurrentMap.
>>>>> The question is if we should drop the interface and all the methods
>>>>> which aren't efficiently implementable, or fix all those
methods.
>>>>>
>>>>> In the past I loved that I could inject "Infinispan
superpowers" into
>>>>> an application making extensive use of Map and ConcurrentMap without
>>>>> changes, but that has been deceiving and required great care such as
>>>>> verifying that these features would not be used anywhere in the
code.
>>>>> I ended up wrapping the Cache implementation in a custom adapter
which
>>>>> would also implement ConcurrentMap but would throw a
RuntimeException
>>>>> if any of the "unallowed" methods was called, at least I
would detect
>>>>> violations safely.
>>>>>
>>>>> I still think that for the time being - until a better solution is
>>>>> planned - we should throw exceptions.. alas that's an old
conversation
>>>>> and it was never done.
>>>>>
>>>>> Sanne
>>>>>
>>>>>
>>>>>> -Dennis
>>>>>>
>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> recently we had a discussion about what size() returns, but
I've
>>>>>>> realized there are more things that users would like to know.
My
>>>>>>> question is whether you think that they would really
appreciate it, or
>>>>>>> whether it's just my QA point of view where I sometimes
compute the
>>>>>>> 'checksums' of cache to see if I didn't lost
anything.
>>>>>>>
>>>>>>> There are those sizes:
>>>>>>> A) number of owned entries
>>>>>>> B) number of entries stored locally in memory
>>>>>>> C) number of entries stored in each local cache store
>>>>>>> D) number of entries stored in each shared cache store
>>>>>>> E) total number of entries in cache
>>>>>>>
>>>>>>> So far, we can get
>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size()
>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size()
>>>>>>> E via distributed iterators / MR
>>>>>>> A via data container iteration + distribution manager query,
but only
>>>>>>> without cache store
>>>>>>> C or D through
>>>>>>>
getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores()
>>>>>>>
>>>>>>> I think that it would go along with users' expectations
if size()
>>>>>>> returned E and for the rest we should have special methods
on
>>>>>>> AdvancedCache. That would of course change the meaning of
size(), but
>>>>>>> I'd say that finally to something that has firm meaning.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> Radim
>>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Radim Vansa <rvansa(a)redhat.com>
>> JBoss DataGrid QA
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa(a)redhat.com>
JBoss DataGrid QA
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev