[infinispan-dev] About size()

Dan Berindei dan.berindei at gmail.com
Tue Oct 14 03:33:14 EDT 2014


On Tue, Oct 14, 2014 at 9:55 AM, Radim Vansa <rvansa at redhat.com> wrote:

> On 10/13/2014 05:55 PM, Mircea Markus wrote:
> > On Oct 13, 2014, at 14:06, Dan Berindei <dan.berindei at gmail.com> wrote:
> >
> >>
> >> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >>
> >> On Oct 10, 2014, at 17:30, William Burns <mudokonman at gmail.com> wrote:
> >>
> >>>>>>> Also we didn't really talk about the fact that these methods would
> >>>>>>> ignore ongoing transactions and if that is a concern or not.
> >>>>>>>
> >>>>>> It might be a concern for the Hibernate 2LC impl, it was their TCK
> that
> >>>>>> prompted the last round of discussions about clear().
> >>>>> Although I wonder how much these methods are even used since they
> only
> >>>>> work for Local, Replication or Invalidation caches in their current
> >>>>> state (and didn't even use loaders until 6.0).
> >>>>
> >>>> There is some more information about the test in the mailing list
> discussion
> >>>> [1]
> >>>> There's also a JIRA for clear() [2]
> >>>>
> >>>> I think 2LC almost never uses distribution, so size() being local-only
> >>>> didn't matter, but making it non-tx could cause problems - at least
> for that
> >>>> particular test.
> >>> I had toyed around with the following idea before, but I never thought
> >>> of it in the scope of the size method solely, but I have a solution
> >>> that would work mostly for transactional caches.  Essentially the size
> >>> method would always operate in a READ_COMMITTED like state, using
> >>> REPEATABLE_READ doesn't seem feasible since we can't keep all the
> >>> contents in memory.  Essentially the iterator would be ran and for
> >>> each key that is found it checks the context to see if it is there.
> >>> If the context entry is marked as removed it doesn't count the key, if
> >>> the key is there it marks the key as found and counts it, and if it is
> >>> not found it counts it.  Then after iteration it finds all the keys in
> >>> the context that were not found and also adds them to the count.  This
> >>> way it doesn't need to store additional memory (besides iteration
> >>> costs) as all the context information is in memory.
> >> sounds good to me.
> >>
> >> Mircea, you have to decide whether you want the precise estimation
> using the entry iterator or the loose estimation using dataContainer.size()
> :)
> >>
> >> I guess we can't make size() read everything into the invocation
> context, so READ_COMMITTED is all we can provide if we want to keep size()
> transactional. Maybe we don't really need it though... Will, could you
> investigate the failing test that started the clear() thread [1] to see if
> it really needs size() to be transactional?
> > I'm okay with both approaches TBH, both are much better than what we
> currently have. The accurate one is more costly but seems to be the
> solution of choice so let's go for it.
> >
> >>
> >>> My original thought was to also make the EntryIterator transactional
> >>> in the same way which also means the keySet, entrySet and values
> >>> methods could do the same things.  The main reason stumbling block I
> >>> had was the fact that the iterator and various collections returned
> >>> could be used outside of the ongoing transaction which didn't seem to
> >>> make much sense to me.  But maybe these should be changed to be more
> >>> like backing maps which HashMap, ConcurrentHashMap etc use for their
> >>> methods, where instead it would pick up the transaction if there is
> >>> one in the current thread and if there is no transaction just start an
> >>> implicit one.
> >> or if they are outside of a transaction to deny progress
> >>
> >> I don't think it's fair to require an explicit transaction for every
> entrySet(). It should be possible to start an iteration without a
> transaction, and only to invalidate an iteration started from an explicit
> transaction the moment the transaction is committed/rolled back (although
> it would complicate rules a bit).
> >>
> >> And what happens if the user writes to the cache while it's iterating
> through the cache-backed collection? Should the user see the new entry in
> the iteration, or not? I don't think you can figure out at the end of the
> iteration which keys were included without keeping all the keys on the
> originator.
> > If the modification is done outside the iterator one might expect an
> ConcurrentModificationException, as it is the case with some JDK iterators.
>
> -1 We're aiming at high performance cache with a lot of changes while
> the operation is executed. This way, the iteration would never complete,
> unless you explicitly switch the cache to read only mode (either through
> Infinispan operation or in application).
>

I was referring only to changes made in the same transaction, not changes
made by other transactions. But you make a good point, we can't throw a
ConcurrentModificationException if the user for writes in the same
transaction and ignore other transactions.


>
> I think that adding isCacheModified() or isTopologyChanged() to the
> iterator would make sense, if that's not too complicated to implement.
> Though, if we want non-disturbed iteration, snapshot isolation is the
> only answer.
>

isCacheModified() is probably too costly to implement.
isTopologyChanged() could be done, but I'm not sure what's the use case, as
the entry iterator abstracts topology changes from the user.

I don't think we want undisturbed iteration, at least not at this point.
Personally, I just want to have a good story on why the iteration behaves
in a certain way. By my standards, explaining that changes made by other
transactions may completely/partially/not at all be visible in the
iteration is fine, explaining that changes made by the same transaction may
or may not be visible is not.


Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141014/f69e85b5/attachment-0001.html 


More information about the infinispan-dev mailing list