[infinispan-dev] About size()

William Burns mudokonman at gmail.com
Tue Nov 4 09:23:03 EST 2014


The various bulk operations for the Map interface are now implemented
to utilize the entire cluster in the new Infinispan 7.0 Release [1].

Please find more information about the bulk operations at [2]

[1] http://blog.infinispan.org/2014/11/infinispan-700final-is-out.html
[2] http://blog.infinispan.org/2014/11/why-doesnt-mapsize-return-size-of.html

On Tue, Oct 14, 2014 at 8:11 AM, William Burns <mudokonman at gmail.com> wrote:
> On Tue, Oct 14, 2014 at 3:33 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>>
>>
>> On Tue, Oct 14, 2014 at 9:55 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>
>>> On 10/13/2014 05:55 PM, Mircea Markus wrote:
>>> > On Oct 13, 2014, at 14:06, Dan Berindei <dan.berindei at gmail.com> wrote:
>>> >
>>> >>
>>> >> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus <mmarkus at redhat.com>
>>> >> wrote:
>>> >>
>>> >> On Oct 10, 2014, at 17:30, William Burns <mudokonman at gmail.com> wrote:
>>> >>
>>> >>>>>>> Also we didn't really talk about the fact that these methods would
>>> >>>>>>> ignore ongoing transactions and if that is a concern or not.
>>> >>>>>>>
>>> >>>>>> It might be a concern for the Hibernate 2LC impl, it was their TCK
>>> >>>>>> that
>>> >>>>>> prompted the last round of discussions about clear().
>>> >>>>> Although I wonder how much these methods are even used since they
>>> >>>>> only
>>> >>>>> work for Local, Replication or Invalidation caches in their current
>>> >>>>> state (and didn't even use loaders until 6.0).
>>> >>>>
>>> >>>> There is some more information about the test in the mailing list
>>> >>>> discussion
>>> >>>> [1]
>>> >>>> There's also a JIRA for clear() [2]
>>> >>>>
>>> >>>> I think 2LC almost never uses distribution, so size() being
>>> >>>> local-only
>>> >>>> didn't matter, but making it non-tx could cause problems - at least
>>> >>>> for that
>>> >>>> particular test.
>>> >>> I had toyed around with the following idea before, but I never thought
>>> >>> of it in the scope of the size method solely, but I have a solution
>>> >>> that would work mostly for transactional caches.  Essentially the size
>>> >>> method would always operate in a READ_COMMITTED like state, using
>>> >>> REPEATABLE_READ doesn't seem feasible since we can't keep all the
>>> >>> contents in memory.  Essentially the iterator would be ran and for
>>> >>> each key that is found it checks the context to see if it is there.
>>> >>> If the context entry is marked as removed it doesn't count the key, if
>>> >>> the key is there it marks the key as found and counts it, and if it is
>>> >>> not found it counts it.  Then after iteration it finds all the keys in
>>> >>> the context that were not found and also adds them to the count.  This
>>> >>> way it doesn't need to store additional memory (besides iteration
>>> >>> costs) as all the context information is in memory.
>>> >> sounds good to me.
>>> >>
>>> >> Mircea, you have to decide whether you want the precise estimation
>>> >> using the entry iterator or the loose estimation using dataContainer.size()
>>> >> :)
>>> >>
>>> >> I guess we can't make size() read everything into the invocation
>>> >> context, so READ_COMMITTED is all we can provide if we want to keep size()
>>> >> transactional. Maybe we don't really need it though... Will, could you
>>> >> investigate the failing test that started the clear() thread [1] to see if
>>> >> it really needs size() to be transactional?
>>> > I'm okay with both approaches TBH, both are much better than what we
>>> > currently have. The accurate one is more costly but seems to be the solution
>>> > of choice so let's go for it.
>>> >
>>> >>
>>> >>> My original thought was to also make the EntryIterator transactional
>>> >>> in the same way which also means the keySet, entrySet and values
>>> >>> methods could do the same things.  The main reason stumbling block I
>>> >>> had was the fact that the iterator and various collections returned
>>> >>> could be used outside of the ongoing transaction which didn't seem to
>>> >>> make much sense to me.  But maybe these should be changed to be more
>>> >>> like backing maps which HashMap, ConcurrentHashMap etc use for their
>>> >>> methods, where instead it would pick up the transaction if there is
>>> >>> one in the current thread and if there is no transaction just start an
>>> >>> implicit one.
>>> >> or if they are outside of a transaction to deny progress
>>> >>
>>> >> I don't think it's fair to require an explicit transaction for every
>>> >> entrySet(). It should be possible to start an iteration without a
>>> >> transaction, and only to invalidate an iteration started from an explicit
>>> >> transaction the moment the transaction is committed/rolled back (although it
>>> >> would complicate rules a bit).
>>> >>
>>> >> And what happens if the user writes to the cache while it's iterating
>>> >> through the cache-backed collection? Should the user see the new entry in
>>> >> the iteration, or not? I don't think you can figure out at the end of the
>>> >> iteration which keys were included without keeping all the keys on the
>>> >> originator.
>>> > If the modification is done outside the iterator one might expect an
>>> > ConcurrentModificationException, as it is the case with some JDK iterators.
>>>
>>> -1 We're aiming at high performance cache with a lot of changes while
>>> the operation is executed. This way, the iteration would never complete,
>>> unless you explicitly switch the cache to read only mode (either through
>>> Infinispan operation or in application).
>>
>>
>> I was referring only to changes made in the same transaction, not changes
>> made by other transactions. But you make a good point, we can't throw a
>> ConcurrentModificationException if the user for writes in the same
>> transaction and ignore other transactions.
>>
>>>
>>>
>>> I think that adding isCacheModified() or isTopologyChanged() to the
>>> iterator would make sense, if that's not too complicated to implement.
>>> Though, if we want non-disturbed iteration, snapshot isolation is the
>>> only answer.
>>
>>
>> isCacheModified() is probably too costly to implement.
>> isTopologyChanged() could be done, but I'm not sure what's the use case, as
>> the entry iterator abstracts topology changes from the user.
>>
>> I don't think we want undisturbed iteration, at least not at this point.
>> Personally, I just want to have a good story on why the iteration behaves in
>> a certain way. By my standards, explaining that changes made by other
>> transactions may completely/partially/not at all be visible in the iteration
>> is fine, explaining that changes made by the same transaction may or may not
>> be visible is not.
>
> Sorry I didn't respond earlier.  But these commands would check the
> transaction context before returning the value to the user.  This
> requires a user interaction for this to occur, so we can guarantee
> they will always see their updated value if they have one in the
> transaction (even if one is ran in between the iteration).  The big
> thing is whether or not another transaction's update is seen when we
> don't have an update for that key (that will be occur if the segment
> is completed before the update or not).
>
> There should be no need to tell if the cache modified or the topology
> changed (the former would require a very high impact for performance
> to tell with a DIST cache).
>
> To be honest the wrapper classes would just be delegating to the Cache
> for the vast majority of operations (get, remove, contains etc.).  It
> would only be when someone specifically uses the iterator on the
> various collections that the distributed iterator would even be used.
> This way the various collections would be backing maps, like HashMap
> and ConcurrentHashMap have, just they have to check the transaction as
> well.  The values collection would be extremely limited in its
> supported methods though, pretty much only to iteration and size.
>
>>
>>
>> Cheers
>> Dan
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list