[infinispan-dev] L1 Consistency with Sync Caches

Fri Jun 28 08:41:22 EDT 2013

On Fri, Jun 28, 2013 at 5:53 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
> On Fri, Jun 28, 2013 at 12:51 AM, William Burns <mudokonman at gmail.com>
> wrote:
>>
>> On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei <dan.berindei at gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, Jun 27, 2013 at 4:40 PM, William Burns <mudokonman at gmail.com>
>> > wrote:
>> >>
>> >> Comments that were outstanding on PR:
>> >>
>> >> @danberindei:
>> >>
>> >>  > +1 to move the discussion to the mailing list, could you summarize
>> >> your changes (preferably for both non-tx and tx cases) and send an
>> >> email to the list?
>> >>  > And now to add some more to this already unwieldy discussion :)
>> >>
>> >>  >  1. Many interceptors check ownership, I don't think that would be
>> >> a problem. Besides, I think creating a new L1ReadSynchronizer for
>> >> every read is just as bad for performance as locking the key on every
>> >> read, so you'd need that check either way.
>> >> >
>> >> We can't use a try lock approach for L1 invalidation when gets are
>> >> locked without possibly dropping a L1 update
>> >> We can't use a timed lock approach for L1 invalidation when writes
>> >> lock the key as we could get into a timed deadlock situation when
>> >> another node concurrently writes to a value/stripe
>> >>
>> >> I still don't see how we can get away with locking on a get.  What are
>> >> you proposing?
>> >
>> >
>> > I'm proposing something like this:
>> >
>> > 1. For get commands: acquire local lock + remote get + store in L1 +
>> > release
>> > local lock
>> > 2. For invalidate commands: acquire local lock + remove entry from L1 +
>> > release local lock
>> > 3. For write commands: invoke on primary owner, and the primary owner
>> > sends
>> > invalidation back to originator; alternatively, skip the invalidation
>> > command to the originator and do invoke on primary owner + acquire local
>> > lock + remove entry from L1 + release local lock
>>
>> I do like this and think it would be simpler for non-tx.  However this
>> will still not be as performant I wouldn't think (just a gut feeling).
>
>
> It would indeed be less performant in some cases, e.g. if we have two get
> commands for a remote key at the same time and the key doesn't exist, my
> proposal would request the key twice.
>
> The writes to L1 *should* be rare enough that the extra lock contention
> there would be minimal. There is a chance that an invalidation will be stuck
> waiting for a remote get, so it will again be slower than your approach.
>
>>  However I am more than certain it will be slower when collisions
>> occur, which is more likely with lock striping.
>>
>
> I'm not that worried about lock striping - the user can always increase the
> number of stripes to get less contention. And I hope they know better than
> to use lock striping and transactions with multiple keys, or they'll get
> deadlocks with or without L1 enabled.

I didn't think of the deadlock problem with tx and lock striping.  I
am wondering should we log a warning message so that user's get an
explicit warning?
>
>>
>> I don't think this however is feasible for tx caches.
>>
>> Optimistic - we don't want to have gets being blocked when the tx is
>> being committed - that could be quite a bit of a performance hit
>> especially if it has to do a 2 phase commit.  Updating the L1 on a
>> remoteGetBeforeWrites  could help remedy this (since get wouldn't need
>> to lock), however then you would be almost implementing what I have
>> and still have locking overhead.  Maybe this would occur so
>> infrequently it wouldn't matter though?
>>
>
> Yes, I would propose removing the L1 commit in EntryWrappingInterceptor as
> well, and moving it to the L1 interceptor - immediately after the value was
> retrieved from the remote owner. The local lock would also be released
> immediately.
>
> The part I wanted to avoid from your proposal was
> L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to
> understand. I guess I'd find it more approachable all the synchronization
> (the map of L1ReadSynchronizers and the interaction with it) would be in a
> separate class, with its own tests, and the L1 interceptor would only know
> about the high-level stuff. Like the LockManager also hides a map of locks;
> not like the LockManager unit tests, because they're kind of lacking, but
> LockManager has the advantage that almost all the tests touch it indirectly.
>

True, I agree I like it as a separate class better - much better
separation.  Maybe I can move that over and add tests and see if that
makes it more clear?  If not we can still always use the locking
approach.
>>
>> Pessimistic - I haven't looked into pessimistic closely yet, but in
>> that case I don't think it ever acquires the local locks so it would
>> have to acquire a remote lock on a get, which would be pretty
>> disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK
>> enabled).  Making this to acquire local locks might work as well, but
>> then would still have the same issue as optimistic when committing - I
>> really am not as familiar with Pessimistic to say for sure.
>>
>
> Nope, acquiring a local lock should be enough. We don't want to prevent
> other txs from modifying the key at all, we just want to delay the
> invalidation commands triggered by those modifications on the local node.
> And the lock would only be held for the duration of the remote get + L1
> update, not all the way to the commit.
>
Okay, cool.  I didn't know for sure.
>>
>> >
>> > All the lock acquisitions would use a regular timeout, not a try lock
>> > approach.
>> In this case they could, yes.
>> >
>> >>
>> >> >   2. By default L1 invalidations are sent as multicasts, so I'm not
>> >> > sure
>> >> > ISPN-3273 really matters here. BTW, I wonder if we have a check to
>> >> > only send
>> >> > L1 invalidations from one node if the threshold is 0...
>> >> I agree that is the default, but we should support the operation,
>> >> although it doesn't matter for this discussion.  Also I am curious as
>> >> to why multicast for L1 isn't set to say 2 by default?  It seems
>> >> wasteful to send a multicast to all members that they process when
>> >> only 1 would do anything about it.  Do you know why this is like that?
>> >
>> >
>> > I suppose it's because we don't have good perf numbers for different L1
>> > invalidation threshold numbers...
>> >
>> > The problem is, we don't have a way to count all the requestors of a key
>> > in
>> > the cluster, so it's reasonably likely that with a threshold of 2 you'd
>> > get
>> > 1 unicast invalidation from one owner + 1 multicast invalidation from
>> > the
>> > other owner, making it less efficient than a single multicast
>> > invalidation.
>>
>> It is a nitpick anyways, and really shouldn't make that big of a
>> difference.
>>
>>
>> >
>> >> >
>> >> >  3a. Right, for put commands we can't hold the local lock while
>> >> > executing the remote put, or we'll have a deadlock. But I think a
>> >> > shorter
>> >> > lock, held only after the remote put completed (or after the lock on
>> >> > the
>> >> > primary owner was acquired, with txs) should work.
>> >> Same point under 1
>> >
>> >
>> > I don't see how we could get a deadlock if we don't hold the local lock
>> > during the remote write invocation.
>>
>> Agree.
>>
>> >
>> >>
>> >> >
>> >> >  3b. We'd also have an ownership check before, so we'd only serialize
>> >> > the get commands that need to go remotely for the same key. I think
>> >> > it would
>> >> > be almost the same as your solution (although it does have one ?
>> >> > disadvantage - if the key doesn't exist in the cache, all the get
>> >> > commands
>> >> > will go remotely). The number of L1 writes should be very small
>> >> > compared to
>> >> > the number of L1 reads anyway, otherwise it would be more efficient
>> >> > to get
>> >> > the key from the owner every time.
>> >> You are saying an optimization for owner nodes so they don't do the
>> >> "corralling" for keys they own?  I like that.  Also I don't think it
>> >> has the disadvantage, it only does remotes it if isn't an owner.
>> >
>> >
>> > I meant your corralling strategy means if you have 2 concurrent get
>> > commands
>> > and one of them retrieves a null from the entry owners, the other
>> > command
>> > will return null directly. With regular locking, the other command
>> > wouldn't
>> > find anything in L1 and it would do another remote get.
>> >
>> > I don't think there's any disadvantage in skipping the corralling for
>> > key
>> > owners, in fact I think we need to skip it if the key already exists in
>> > L1,
>> > too.
>> +1
>> >
>> >>
>> >> >
>> >> > It would be nice to agree on what guarantees we want to provide for
>> >> > L1
>> >> > invalidation in non-tx caches, I'm not sure if we can do anything to
>> >> > prevent
>> >> > this scenario:
>> >> Actually this scenario doesn't occur with non-tx since writes don't
>> >> update the L1 with their value, they just invalidate.  Tx caches are
>> >> fine with this because they acquire the primary owner lock for the
>> >> duration of the write including the L1 update so you can't have this
>> >> ordering.
>> >
>> >
>> > Sounds good.
>> >
>> >>
>> >> >
>> >> > A initiates a put(k, v1) to the primary owner B
>> >> > B performs the put(k, v1), invalidates every non-owner and returns
>> >> > B performs another put(k, v2), invalidating every non-owner
>> >> > A receives the result from B and puts k=v1 in its L1
>> >>
>> >> @pruivo:
>> >>
>> >> > The invalidation does not need to wait for the remote get. When you
>> >> > receive an invalidation, you can mark the current remote get invalid.
>> >> > The
>> >> > invalidation command can return immediately and the remote get can be
>> >> > repeated. Also, it removes the key from data container (if exists)
>> >> Dan hit it right in the head.  Unfortunately there is no guarantee the
>> >> cancellation can work properly, so it is a best effort and if not wait
>> >> until we know we will invalidate properly.
>> >> > The writes can update the L1 through your L1Synchronized by adding a
>> >> > simple method like updateL1(newValue). The blocking threads will
>> >> > return
>> >> > immediately the new value and they don't need to wait for the reply.
>> >> Non tx cache write operations aren't safe to update L1 with the value
>> >> since they don't acquire the owning lock while updating the L1, which
>> >> means you could have interleaved writes.  Which is the primary reason
>> >> I rejected ISPN- 3214.  For tx caches we can't do this since the
>> >> update has to take part of the tx, which the get would be updating the
>> >> L1 outside of a transaction.
>> >> > I see... However, I think that all the events should synchronize at
>> >> > some
>> >> > point (update by remote get, update by local put and invalidation).
>> >> I was hoping that would cover this.  Other than the outstanding issue
>> >> in ISPN-2965.
>> >>
>> >> On Thu, Jun 27, 2013 at 9:18 AM, William Burns <mudokonman at gmail.com>
>> >> wrote:
>> >> > First off I apologize for the length.
>> >> >
>> >> > There have been a few Jiras recently that have identified L1
>> >> > consistency
>> >> > issues with both TX and non TX sync caches.  Async caches with L1
>> >> > have
>> >> > their
>> >> > own issues as well, but I only wanted to talk about sync caches.
>> >> >
>> >> > https://issues.jboss.org/browse/ISPN-3197
>> >> > https://issues.jboss.org/browse/ISPN-2965
>> >> > https://issues.jboss.org/browse/ISPN-2990
>> >> >
>> >> > I have proposed a solution in
>> >> > https://github.com/infinispan/infinispan/pull/1922 which should start
>> >> > L1
>> >> > consistency down the right track.  There are quite a few comments on
>> >> > it
>> >> > if
>> >> > you want to look into it more, but because of that I am moving this
>> >> > to
>> >> > the
>> >> > dev mailing list.
>> >> >
>> >> > The key changes in the PR are the following (non-tx):
>> >> >
>> >> > 1. Concurrent reads for a key that can retrieve a remote value are
>> >> > "corralled" into a single thread of execution for that given key.
>> >> > This
>> >> > would reduce network traffic with concurrent gets for the same key.
>> >> > Note
>> >> > the "corralling" only happens on a per key basis.
>> >> > 2. The single thread that is doing the remote get would update the L1
>> >> > if
>> >> > able (without locking) and make available the value to all the
>> >> > requests
>> >> > waiting on the get.
>> >> > 3. Invalidations that are received would first check to see if there
>> >> > is
>> >> > a
>> >> > current remote get occurring for it's keys.  If there is it will
>> >> > attempt
>> >> > to
>> >> > cancel the L1 write(s) before it occurs.  If it cannot cancel the L1
>> >> > write,
>> >> > then it must also wait on the current remote get completion and
>> >> > subsequently
>> >> > run the invalidation.  Note the cancellation would fail when the
>> >> > remote
>> >> > get
>> >> > was done and it is in the middle of updating the L1, so this would be
>> >> > very
>> >> > small window.
>> >> > 4. Local writes will also do the same thing as the invalidation with
>> >> > cancelling or waiting.  Note that non tx local writes only do L1
>> >> > invalidations and don't write the value to the data container.
>> >> > Reasons
>> >> > why
>> >> > I found at https://issues.jboss.org/browse/ISPN-3214
>> >> > 5. Writes that require the previous value and don't have it in the L1
>> >> > would
>> >> > also do it's get operations using the same "corralling" method.
>> >> >
>> >> > 4/5 are not currently implemented in PR.
>> >> >
>> >> > This approach would use no locking for non tx caches for all L1
>> >> > operations.
>> >> > The synchronization point would be done through the "corralling"
>> >> > method
>> >> > and
>> >> > invalidations/writes communicating to it.
>> >> >
>> >> > Transactional caches would do almost the same thing as non-tx.  Note
>> >> > these
>> >> > changes are not done in any way yet.
>> >> >
>> >> > 1. Gets would now update the L1 immediately after retrieving the
>> >> > value
>> >> > without locking, but still using the "corralling" technique that
>> >> > non-tx
>> >> > does.  Previously the L1 update from a get was transactional.  This
>> >> > actually
>> >> > would remedy issue [1]
>> >> > 2. Writes currently acquire the remote lock when committing, which is
>> >> > why tx
>> >> > caches are able to update the L1 with the value.  Writes would do the
>> >> > same
>> >> > cancellation/wait method as non-tx.
>> >> > 3. Writes that require the previous value and don't have it in the L1
>> >> > would
>> >> > also do it's get operations using the same method.
>> >> > 4. For tx cache [2] would also have to be done.
>> >> >
>> >> > [1] -
>> >> >
>> >> >
>> >> > https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780
>> >> > [2] - https://issues.jboss.org/browse/ISPN-1540
>> >> >
>> >> > Also rehashing is another issue, but we should be able to acquire the
>> >> > state
>> >> > transfer lock before updating the L1 on a get, just like when an
>> >> > entry
>> >> > is
>> >> > committed to the data container.
>> >> >
>> >> > Any comments/concerns would be appreciated.
>> >> >
>> >> > Thanks,
>> >> >
>> >> >  - Will
>> >>
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev