[infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction

Dan Berindei dan.berindei at gmail.com
Thu Oct 3 05:29:41 EDT 2013


On Wed, Oct 2, 2013 at 12:48 PM, Pedro Ruivo <pedro at infinispan.org> wrote:

>
>
> On 10/02/2013 12:03 AM, Sanne Grinovero wrote:
> > I'd love to brainstorm about the clear() operation and what it means
> > on Infinispan.
> >
> > I'm not sure to what extent, but it seems that clear() is designed to
> > work in a TX, or even create an implicit transaction if needed, but
> > I'm not understanding how that can work.
>

I think it's pretty straight-forward to define the semantics of operations
after clear(): they should assume that every key in the cache was removed.

True, like Pedro mentioned, the current implementation allows subsequent
operations in the same tx to see keys inserted by other txs. But I don't
think fixing that would be harder than ensuring that if a tx inserts key k
owned by nodes A and B, clear() from outside the tx either deletes k on A
and B or doesn't delete k on either node.


> >
> > Obviously a clear() operation isn't listing all keys explicitly.
>
> (offtopic) like entrySet(), keySet(), values() and iterator() when
> executed in a transactional context. however, I think that this
> operations are more difficult to handle, because they have to return
> consistent data...
>

I agree, defining how entrySet() and the others should work in a tx seems
harder than defining clear().

Will, I'm guessing your fix for ISPN-3560 still allows entrySet() after
clear() to see entries inserted by another tx in the meantime?


> > Which
> > implies that it's undefined on which keys it's going to operate when
> > it's fired.. that seems like terribly wrong in a distributed key/value
> > store as we can't possibly freeze the global state and somehow define
> > a set of keys which are going to be affected, while an explicit
> > enumeration is needed to acquire the needed locks.
>
> you may not need to explicit acquire the lock for the affected key. you
> can use a global lock that is share acquired for write transactions
> without clear operation and exclusively acquire for transactions with
> clear operation (exactly what I use for Total Order). also, the clear
> transactions (i.e. the transactions with clear operation) can also only
> acquired the global lock.
>

+1 for the global lock. It may slow things down a bit in the general case,
because every tx will have to acquire it in shared mode, but it will
guarantee consistency for clear().

It still won't help entrySet() and its brethren...


> >
> > It might give a nice safe feeling that, when invoking a clear()
> > operation in a transaction, I can still abort the transaction to make
> > it cancel the operation; that's the only good part I can think of: we
> > can cancel it.
>
> yep
>

Wouldn't it be harder to integrate it with Total Order if it didn't have a
tx?

But most of all, I think we have to keep it because it's a part of the Map
interface and a part of JCache's Cache interface.


>
> >
> > I don't think it has anything to do with consistency though? To make
> > sure you're effectively involving all replicas of all entries in a
> > consistent way, a lock would need to be acquired on each affected key,
> > which again implies a need to enumerate all keys, including the
> > unknown keys which might be hiding in a CacheStore: it's not enough to
> > broadcast the clear() operation to all nodes and have them simply wipe
> > their local state as that's never going to deal correctly
> > (consistently) with in-flight transactions working on different nodes
> > at different times (I guess enabling Total Order could help but you'd
> > need to make it mandatory).
> >
>
> see the comment above about the locks... may not be the most efficient,
> but handles the problem.
>

Note that we need to iterate the keys anyway, in order to fire the
CacheEntryRemoved listeners. Even if we optimized it skip the iteration
when there are no registered listeners, it would still have to handle the
general case.


> > So let's step back a second and consider what is the use case for
> > clear() ? I suspect it's primarily a method needed during testing, or
> > maybe cleanup before a backup is restored (operations), maybe a
> > manually activated JMX operation to clear the cache in exceptional
> > cases.
>
> I'm not aware of the uses cases of a clear() operation, but use it as
> JMX operation can be a problem in transactional caches (depending the
> semantic you have in mind).
>
> If it is going to clear all the caches (i.e. from all replicas/nodes),
> then we have to find a way to set an order with the current transactions
> or we will have something some nodes delivering a transaction (say tx1)
> before the clear() and other nodes delivering tx1 after the clear(),
> leading to an inconsistent state.
>
> (offtopic) I'm wondering if we have that problem in the current
> implementation if tx1 is only insert new keys... in that case, we have
> no lock to serialize an order between the clear() and tx1...
>

I'm pretty sure we do have this problem. We definitely need to add some
tests for clear concurrent with other write operations.

In my view, as long as we're going to expose a clear() operation in some
way, it's going to be easiest to include it in a tx in transactional
caches. Exposing it in some other way is not going to make it any more
consistent, although prohibiting other operations in the same tx may make
the code a bit simpler.


>
> >
> > I don't think there would ever be a need for a clear() operation to
> > interact with other transactions, so I'd rather make it illegal to
> > invoke a clear() inside a transaction, or simply ignore the
> > transactional scope and have an immediate and distributed effect.
>

We can't control whether other transactions are invoked concurrently with
clear(), so we can't make that illegal... If you mean interaction with
other operations in the same transaction, I agree that it's not absolutely
necessary to allow them, but I think most of that work has already been
done.


> I think the second solution can make inconsistency data ("or simply
> ignore the transactional scope and have an immediate and distributed
> effect")
>

Maybe that's the idea? Sanne, do you mean that if clear() were exposed via
a different API, it wouldn't have to provide the same consistency
guarantees as regular operations?


>
> >
> > I'm likely missing something. What terrible consequences would this have?
> >
> > Cheers,
> > Sanne
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20131003/98e62542/attachment.html 


More information about the infinispan-dev mailing list