Re: [infinispan-dev] About size()

Tuesday, 4 November 2014

The various bulk operations for the Map interface are now implemented
to utilize the entire cluster in the new Infinispan 7.0 Release [1].

Please find more information about the bulk operations at [2]

[1] http://blog.infinispan.org/2014/11/infinispan-700final-is-out.html
[2] http://blog.infinispan.org/2014/11/why-doesnt-mapsize-return-size-of.html

On Tue, Oct 14, 2014 at 8:11 AM, William Burns <mudokonman(a)gmail.com&gt; wrote:
...
 On Tue, Oct 14, 2014 at 3:33 AM, Dan Berindei
<dan.berindei(a)gmail.com&gt; wrote:
>
>
> On Tue, Oct 14, 2014 at 9:55 AM, Radim Vansa <rvansa(a)redhat.com&gt; wrote:
>>
>> On 10/13/2014 05:55 PM, Mircea Markus wrote:
>> > On Oct 13, 2014, at 14:06, Dan Berindei <dan.berindei(a)gmail.com&gt;
wrote:
>> >
>> >>
>> >> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus
<mmarkus(a)redhat.com&gt;
>> >> wrote:
>> >>
>> >> On Oct 10, 2014, at 17:30, William Burns <mudokonman(a)gmail.com&gt;
wrote:
>> >>
>> >>>>>>> Also we didn't really talk about the fact that
these methods would
>> >>>>>>> ignore ongoing transactions and if that is a concern
or not.
>> >>>>>>>
>> >>>>>> It might be a concern for the Hibernate 2LC impl, it was
their TCK
>> >>>>>> that
>> >>>>>> prompted the last round of discussions about clear().
>> >>>>> Although I wonder how much these methods are even used since
they
>> >>>>> only
>> >>>>> work for Local, Replication or Invalidation caches in their
current
>> >>>>> state (and didn't even use loaders until 6.0).
>> >>>>
>> >>>> There is some more information about the test in the mailing
list
>> >>>> discussion
>> >>>> [1]
>> >>>> There's also a JIRA for clear() [2]
>> >>>>
>> >>>> I think 2LC almost never uses distribution, so size() being
>> >>>> local-only
>> >>>> didn't matter, but making it non-tx could cause problems -
at least
>> >>>> for that
>> >>>> particular test.
>> >>> I had toyed around with the following idea before, but I never
thought
>> >>> of it in the scope of the size method solely, but I have a solution
>> >>> that would work mostly for transactional caches.  Essentially the
size
>> >>> method would always operate in a READ_COMMITTED like state, using
>> >>> REPEATABLE_READ doesn't seem feasible since we can't keep
all the
>> >>> contents in memory.  Essentially the iterator would be ran and for
>> >>> each key that is found it checks the context to see if it is there.
>> >>> If the context entry is marked as removed it doesn't count the
key, if
>> >>> the key is there it marks the key as found and counts it, and if it
is
>> >>> not found it counts it.  Then after iteration it finds all the keys
in
>> >>> the context that were not found and also adds them to the count. 
This
>> >>> way it doesn't need to store additional memory (besides
iteration
>> >>> costs) as all the context information is in memory.
>> >> sounds good to me.
>> >>
>> >> Mircea, you have to decide whether you want the precise estimation
>> >> using the entry iterator or the loose estimation using
dataContainer.size()
>> >> :)
>> >>
>> >> I guess we can't make size() read everything into the invocation
>> >> context, so READ_COMMITTED is all we can provide if we want to keep
size()
>> >> transactional. Maybe we don't really need it though... Will, could
you
>> >> investigate the failing test that started the clear() thread [1] to see
if
>> >> it really needs size() to be transactional?
>> > I'm okay with both approaches TBH, both are much better than what we
>> > currently have. The accurate one is more costly but seems to be the
solution
>> > of choice so let's go for it.
>> >
>> >>
>> >>> My original thought was to also make the EntryIterator
transactional
>> >>> in the same way which also means the keySet, entrySet and values
>> >>> methods could do the same things.  The main reason stumbling block
I
>> >>> had was the fact that the iterator and various collections returned
>> >>> could be used outside of the ongoing transaction which didn't
seem to
>> >>> make much sense to me.  But maybe these should be changed to be
more
>> >>> like backing maps which HashMap, ConcurrentHashMap etc use for
their
>> >>> methods, where instead it would pick up the transaction if there is
>> >>> one in the current thread and if there is no transaction just start
an
>> >>> implicit one.
>> >> or if they are outside of a transaction to deny progress
>> >>
>> >> I don't think it's fair to require an explicit transaction for
every
>> >> entrySet(). It should be possible to start an iteration without a
>> >> transaction, and only to invalidate an iteration started from an
explicit
>> >> transaction the moment the transaction is committed/rolled back
(although it
>> >> would complicate rules a bit).
>> >>
>> >> And what happens if the user writes to the cache while it's
iterating
>> >> through the cache-backed collection? Should the user see the new entry
in
>> >> the iteration, or not? I don't think you can figure out at the end
of the
>> >> iteration which keys were included without keeping all the keys on the
>> >> originator.
>> > If the modification is done outside the iterator one might expect an
>> > ConcurrentModificationException, as it is the case with some JDK iterators.
>>
>> -1 We're aiming at high performance cache with a lot of changes while
>> the operation is executed. This way, the iteration would never complete,
>> unless you explicitly switch the cache to read only mode (either through
>> Infinispan operation or in application).
>
>
> I was referring only to changes made in the same transaction, not changes
> made by other transactions. But you make a good point, we can't throw a
> ConcurrentModificationException if the user for writes in the same
> transaction and ignore other transactions.
>
>>
>>
>> I think that adding isCacheModified() or isTopologyChanged() to the
>> iterator would make sense, if that's not too complicated to implement.
>> Though, if we want non-disturbed iteration, snapshot isolation is the
>> only answer.
>
>
> isCacheModified() is probably too costly to implement.
> isTopologyChanged() could be done, but I'm not sure what's the use case, as
> the entry iterator abstracts topology changes from the user.
>
> I don't think we want undisturbed iteration, at least not at this point.
> Personally, I just want to have a good story on why the iteration behaves in
> a certain way. By my standards, explaining that changes made by other
> transactions may completely/partially/not at all be visible in the iteration
> is fine, explaining that changes made by the same transaction may or may not
> be visible is not.

 Sorry I didn't respond earlier.  But these commands would check the
 transaction context before returning the value to the user.  This
 requires a user interaction for this to occur, so we can guarantee
 they will always see their updated value if they have one in the
 transaction (even if one is ran in between the iteration).  The big
 thing is whether or not another transaction's update is seen when we
 don't have an update for that key (that will be occur if the segment
 is completed before the update or not).

 There should be no need to tell if the cache modified or the topology
 changed (the former would require a very high impact for performance
 to tell with a DIST cache).

 To be honest the wrapper classes would just be delegating to the Cache
 for the vast majority of operations (get, remove, contains etc.).  It
 would only be when someone specifically uses the iterator on the
 various collections that the distributed iterator would even be used.
 This way the various collections would be backing maps, like HashMap
 and ConcurrentHashMap have, just they have to check the transaction as
 well.  The values collection would be extremely limited in its
 supported methods though, pretty much only to iteration and size.

>
>
> Cheers
> Dan
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] About size()