Re: [infinispan-dev] About size()

Tuesday, 14 October 2014

On 10/13/2014 05:55 PM, Mircea Markus wrote:
...
 On Oct 13, 2014, at 14:06, Dan Berindei
<dan.berindei(a)gmail.com&gt; wrote:

>
> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus <mmarkus(a)redhat.com&gt; wrote:
>
> On Oct 10, 2014, at 17:30, William Burns <mudokonman(a)gmail.com&gt; wrote:
>
>>>>>> Also we didn't really talk about the fact that these methods
would
>>>>>> ignore ongoing transactions and if that is a concern or not.
>>>>>>
>>>>> It might be a concern for the Hibernate 2LC impl, it was their TCK
that
>>>>> prompted the last round of discussions about clear().
>>>> Although I wonder how much these methods are even used since they only
>>>> work for Local, Replication or Invalidation caches in their current
>>>> state (and didn't even use loaders until 6.0).
>>>
>>> There is some more information about the test in the mailing list discussion
>>> [1]
>>> There's also a JIRA for clear() [2]
>>>
>>> I think 2LC almost never uses distribution, so size() being local-only
>>> didn't matter, but making it non-tx could cause problems - at least for
that
>>> particular test.
>> I had toyed around with the following idea before, but I never thought
>> of it in the scope of the size method solely, but I have a solution
>> that would work mostly for transactional caches.  Essentially the size
>> method would always operate in a READ_COMMITTED like state, using
>> REPEATABLE_READ doesn't seem feasible since we can't keep all the
>> contents in memory.  Essentially the iterator would be ran and for
>> each key that is found it checks the context to see if it is there.
>> If the context entry is marked as removed it doesn't count the key, if
>> the key is there it marks the key as found and counts it, and if it is
>> not found it counts it.  Then after iteration it finds all the keys in
>> the context that were not found and also adds them to the count.  This
>> way it doesn't need to store additional memory (besides iteration
>> costs) as all the context information is in memory.
> sounds good to me.
>
> Mircea, you have to decide whether you want the precise estimation using the entry
iterator or the loose estimation using dataContainer.size() :)
>
> I guess we can't make size() read everything into the invocation context, so
READ_COMMITTED is all we can provide if we want to keep size() transactional. Maybe we
don't really need it though... Will, could you investigate the failing test that
started the clear() thread [1] to see if it really needs size() to be transactional?
 I'm okay with both approaches TBH, both are much better than what we currently have.
The accurate one is more costly but seems to be the solution of choice so let's go for
it.

>
>> My original thought was to also make the EntryIterator transactional
>> in the same way which also means the keySet, entrySet and values
>> methods could do the same things.  The main reason stumbling block I
>> had was the fact that the iterator and various collections returned
>> could be used outside of the ongoing transaction which didn't seem to
>> make much sense to me.  But maybe these should be changed to be more
>> like backing maps which HashMap, ConcurrentHashMap etc use for their
>> methods, where instead it would pick up the transaction if there is
>> one in the current thread and if there is no transaction just start an
>> implicit one.
> or if they are outside of a transaction to deny progress
>
> I don't think it's fair to require an explicit transaction for every
entrySet(). It should be possible to start an iteration without a transaction, and only to
invalidate an iteration started from an explicit transaction the moment the transaction is
committed/rolled back (although it would complicate rules a bit).
>
> And what happens if the user writes to the cache while it's iterating through the
cache-backed collection? Should the user see the new entry in the iteration, or not? I
don't think you can figure out at the end of the iteration which keys were included
without keeping all the keys on the originator.
 If the modification is done outside the iterator one might expect an
ConcurrentModificationException, as it is the case with some JDK iterators. 
-1 We're aiming at high performance cache with a lot of changes while 
the operation is executed. This way, the iteration would never complete, 
unless you explicitly switch the cache to read only mode (either through 
Infinispan operation or in application).

I think that adding isCacheModified() or isTopologyChanged() to the 
iterator would make sense, if that's not too complicated to implement. 
Though, if we want non-disturbed iteration, snapshot isolation is the 
only answer.

Radim

...

>   
>
>>   This however was a big change from how these
>> collections work currently in that they are in memory copies only.
>>
>> What do you guys think?
> I think that keeping track of the context entries is a better way of iterating so +1.
As you mentioned, we should also make it clear that RC semantic applies.
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
 Cheers, 

-- 
Radim Vansa <rvansa(a)redhat.com&gt;
JBoss DataGrid QA

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] About size()