[infinispan-dev] L1 Consistency with Sync Caches

William Burns mudokonman at gmail.com
Thu Jun 27 17:51:57 EDT 2013


On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
> On Thu, Jun 27, 2013 at 4:40 PM, William Burns <mudokonman at gmail.com> wrote:
>>
>> Comments that were outstanding on PR:
>>
>> @danberindei:
>>
>>  > +1 to move the discussion to the mailing list, could you summarize
>> your changes (preferably for both non-tx and tx cases) and send an
>> email to the list?
>>  > And now to add some more to this already unwieldy discussion :)
>>
>>  >  1. Many interceptors check ownership, I don't think that would be
>> a problem. Besides, I think creating a new L1ReadSynchronizer for
>> every read is just as bad for performance as locking the key on every
>> read, so you'd need that check either way.
>> >
>> We can't use a try lock approach for L1 invalidation when gets are
>> locked without possibly dropping a L1 update
>> We can't use a timed lock approach for L1 invalidation when writes
>> lock the key as we could get into a timed deadlock situation when
>> another node concurrently writes to a value/stripe
>>
>> I still don't see how we can get away with locking on a get.  What are
>> you proposing?
>
>
> I'm proposing something like this:
>
> 1. For get commands: acquire local lock + remote get + store in L1 + release
> local lock
> 2. For invalidate commands: acquire local lock + remove entry from L1 +
> release local lock
> 3. For write commands: invoke on primary owner, and the primary owner sends
> invalidation back to originator; alternatively, skip the invalidation
> command to the originator and do invoke on primary owner + acquire local
> lock + remove entry from L1 + release local lock

I do like this and think it would be simpler for non-tx.  However this
will still not be as performant I wouldn't think (just a gut feeling).
 However I am more than certain it will be slower when collisions
occur, which is more likely with lock striping.

I don't think this however is feasible for tx caches.

Optimistic - we don't want to have gets being blocked when the tx is
being committed - that could be quite a bit of a performance hit
especially if it has to do a 2 phase commit.  Updating the L1 on a
remoteGetBeforeWrites  could help remedy this (since get wouldn't need
to lock), however then you would be almost implementing what I have
and still have locking overhead.  Maybe this would occur so
infrequently it wouldn't matter though?

Pessimistic - I haven't looked into pessimistic closely yet, but in
that case I don't think it ever acquires the local locks so it would
have to acquire a remote lock on a get, which would be pretty
disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK
enabled).  Making this to acquire local locks might work as well, but
then would still have the same issue as optimistic when committing - I
really am not as familiar with Pessimistic to say for sure.

>
> All the lock acquisitions would use a regular timeout, not a try lock
> approach.
In this case they could, yes.
>
>>
>> >   2. By default L1 invalidations are sent as multicasts, so I'm not sure
>> > ISPN-3273 really matters here. BTW, I wonder if we have a check to only send
>> > L1 invalidations from one node if the threshold is 0...
>> I agree that is the default, but we should support the operation,
>> although it doesn't matter for this discussion.  Also I am curious as
>> to why multicast for L1 isn't set to say 2 by default?  It seems
>> wasteful to send a multicast to all members that they process when
>> only 1 would do anything about it.  Do you know why this is like that?
>
>
> I suppose it's because we don't have good perf numbers for different L1
> invalidation threshold numbers...
>
> The problem is, we don't have a way to count all the requestors of a key in
> the cluster, so it's reasonably likely that with a threshold of 2 you'd get
> 1 unicast invalidation from one owner + 1 multicast invalidation from the
> other owner, making it less efficient than a single multicast invalidation.

It is a nitpick anyways, and really shouldn't make that big of a difference.

>
>> >
>> >  3a. Right, for put commands we can't hold the local lock while
>> > executing the remote put, or we'll have a deadlock. But I think a shorter
>> > lock, held only after the remote put completed (or after the lock on the
>> > primary owner was acquired, with txs) should work.
>> Same point under 1
>
>
> I don't see how we could get a deadlock if we don't hold the local lock
> during the remote write invocation.

Agree.

>
>>
>> >
>> >  3b. We'd also have an ownership check before, so we'd only serialize
>> > the get commands that need to go remotely for the same key. I think it would
>> > be almost the same as your solution (although it does have one ?
>> > disadvantage - if the key doesn't exist in the cache, all the get commands
>> > will go remotely). The number of L1 writes should be very small compared to
>> > the number of L1 reads anyway, otherwise it would be more efficient to get
>> > the key from the owner every time.
>> You are saying an optimization for owner nodes so they don't do the
>> "corralling" for keys they own?  I like that.  Also I don't think it
>> has the disadvantage, it only does remotes it if isn't an owner.
>
>
> I meant your corralling strategy means if you have 2 concurrent get commands
> and one of them retrieves a null from the entry owners, the other command
> will return null directly. With regular locking, the other command wouldn't
> find anything in L1 and it would do another remote get.
>
> I don't think there's any disadvantage in skipping the corralling for key
> owners, in fact I think we need to skip it if the key already exists in L1,
> too.
+1
>
>>
>> >
>> > It would be nice to agree on what guarantees we want to provide for L1
>> > invalidation in non-tx caches, I'm not sure if we can do anything to prevent
>> > this scenario:
>> Actually this scenario doesn't occur with non-tx since writes don't
>> update the L1 with their value, they just invalidate.  Tx caches are
>> fine with this because they acquire the primary owner lock for the
>> duration of the write including the L1 update so you can't have this
>> ordering.
>
>
> Sounds good.
>
>>
>> >
>> > A initiates a put(k, v1) to the primary owner B
>> > B performs the put(k, v1), invalidates every non-owner and returns
>> > B performs another put(k, v2), invalidating every non-owner
>> > A receives the result from B and puts k=v1 in its L1
>>
>> @pruivo:
>>
>> > The invalidation does not need to wait for the remote get. When you
>> > receive an invalidation, you can mark the current remote get invalid. The
>> > invalidation command can return immediately and the remote get can be
>> > repeated. Also, it removes the key from data container (if exists)
>> Dan hit it right in the head.  Unfortunately there is no guarantee the
>> cancellation can work properly, so it is a best effort and if not wait
>> until we know we will invalidate properly.
>> > The writes can update the L1 through your L1Synchronized by adding a
>> > simple method like updateL1(newValue). The blocking threads will return
>> > immediately the new value and they don't need to wait for the reply.
>> Non tx cache write operations aren't safe to update L1 with the value
>> since they don't acquire the owning lock while updating the L1, which
>> means you could have interleaved writes.  Which is the primary reason
>> I rejected ISPN- 3214.  For tx caches we can't do this since the
>> update has to take part of the tx, which the get would be updating the
>> L1 outside of a transaction.
>> > I see... However, I think that all the events should synchronize at some
>> > point (update by remote get, update by local put and invalidation).
>> I was hoping that would cover this.  Other than the outstanding issue
>> in ISPN-2965.
>>
>> On Thu, Jun 27, 2013 at 9:18 AM, William Burns <mudokonman at gmail.com>
>> wrote:
>> > First off I apologize for the length.
>> >
>> > There have been a few Jiras recently that have identified L1 consistency
>> > issues with both TX and non TX sync caches.  Async caches with L1 have
>> > their
>> > own issues as well, but I only wanted to talk about sync caches.
>> >
>> > https://issues.jboss.org/browse/ISPN-3197
>> > https://issues.jboss.org/browse/ISPN-2965
>> > https://issues.jboss.org/browse/ISPN-2990
>> >
>> > I have proposed a solution in
>> > https://github.com/infinispan/infinispan/pull/1922 which should start L1
>> > consistency down the right track.  There are quite a few comments on it
>> > if
>> > you want to look into it more, but because of that I am moving this to
>> > the
>> > dev mailing list.
>> >
>> > The key changes in the PR are the following (non-tx):
>> >
>> > 1. Concurrent reads for a key that can retrieve a remote value are
>> > "corralled" into a single thread of execution for that given key.  This
>> > would reduce network traffic with concurrent gets for the same key.
>> > Note
>> > the "corralling" only happens on a per key basis.
>> > 2. The single thread that is doing the remote get would update the L1 if
>> > able (without locking) and make available the value to all the requests
>> > waiting on the get.
>> > 3. Invalidations that are received would first check to see if there is
>> > a
>> > current remote get occurring for it's keys.  If there is it will attempt
>> > to
>> > cancel the L1 write(s) before it occurs.  If it cannot cancel the L1
>> > write,
>> > then it must also wait on the current remote get completion and
>> > subsequently
>> > run the invalidation.  Note the cancellation would fail when the remote
>> > get
>> > was done and it is in the middle of updating the L1, so this would be
>> > very
>> > small window.
>> > 4. Local writes will also do the same thing as the invalidation with
>> > cancelling or waiting.  Note that non tx local writes only do L1
>> > invalidations and don't write the value to the data container.  Reasons
>> > why
>> > I found at https://issues.jboss.org/browse/ISPN-3214
>> > 5. Writes that require the previous value and don't have it in the L1
>> > would
>> > also do it's get operations using the same "corralling" method.
>> >
>> > 4/5 are not currently implemented in PR.
>> >
>> > This approach would use no locking for non tx caches for all L1
>> > operations.
>> > The synchronization point would be done through the "corralling" method
>> > and
>> > invalidations/writes communicating to it.
>> >
>> > Transactional caches would do almost the same thing as non-tx.  Note
>> > these
>> > changes are not done in any way yet.
>> >
>> > 1. Gets would now update the L1 immediately after retrieving the value
>> > without locking, but still using the "corralling" technique that non-tx
>> > does.  Previously the L1 update from a get was transactional.  This
>> > actually
>> > would remedy issue [1]
>> > 2. Writes currently acquire the remote lock when committing, which is
>> > why tx
>> > caches are able to update the L1 with the value.  Writes would do the
>> > same
>> > cancellation/wait method as non-tx.
>> > 3. Writes that require the previous value and don't have it in the L1
>> > would
>> > also do it's get operations using the same method.
>> > 4. For tx cache [2] would also have to be done.
>> >
>> > [1] -
>> >
>> > https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780
>> > [2] - https://issues.jboss.org/browse/ISPN-1540
>> >
>> > Also rehashing is another issue, but we should be able to acquire the
>> > state
>> > transfer lock before updating the L1 on a get, just like when an entry
>> > is
>> > committed to the data container.
>> >
>> > Any comments/concerns would be appreciated.
>> >
>> > Thanks,
>> >
>> >  - Will
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list