<br><br><div class="gmail_quote">On Fri, Jun 28, 2013 at 3:41 PM, William Burns <span dir="ltr"><<a href="mailto:mudokonman@gmail.com" target="_blank">mudokonman@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Fri, Jun 28, 2013 at 5:53 AM, Dan Berindei <<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>> wrote:<br>
><br>
><br>
> On Fri, Jun 28, 2013 at 12:51 AM, William Burns <<a href="mailto:mudokonman@gmail.com">mudokonman@gmail.com</a>><br>
> wrote:<br>
>><br>
>> On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei <<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>><br>
>> wrote:<br>
>> ><br>
>> ><br>
>> > On Thu, Jun 27, 2013 at 4:40 PM, William Burns <<a href="mailto:mudokonman@gmail.com">mudokonman@gmail.com</a>><br>
>> > wrote:<br>
>> >><br>
>> >> Comments that were outstanding on PR:<br>
>> >><br>
>> >> @danberindei:<br>
>> >><br>
>> >> > +1 to move the discussion to the mailing list, could you summarize<br>
>> >> your changes (preferably for both non-tx and tx cases) and send an<br>
>> >> email to the list?<br>
>> >> > And now to add some more to this already unwieldy discussion :)<br>
>> >><br>
>> >> > 1. Many interceptors check ownership, I don't think that would be<br>
>> >> a problem. Besides, I think creating a new L1ReadSynchronizer for<br>
>> >> every read is just as bad for performance as locking the key on every<br>
>> >> read, so you'd need that check either way.<br>
>> >> ><br>
>> >> We can't use a try lock approach for L1 invalidation when gets are<br>
>> >> locked without possibly dropping a L1 update<br>
>> >> We can't use a timed lock approach for L1 invalidation when writes<br>
>> >> lock the key as we could get into a timed deadlock situation when<br>
>> >> another node concurrently writes to a value/stripe<br>
>> >><br>
>> >> I still don't see how we can get away with locking on a get. What are<br>
>> >> you proposing?<br>
>> ><br>
>> ><br>
>> > I'm proposing something like this:<br>
>> ><br>
>> > 1. For get commands: acquire local lock + remote get + store in L1 +<br>
>> > release<br>
>> > local lock<br>
>> > 2. For invalidate commands: acquire local lock + remove entry from L1 +<br>
>> > release local lock<br>
>> > 3. For write commands: invoke on primary owner, and the primary owner<br>
>> > sends<br>
>> > invalidation back to originator; alternatively, skip the invalidation<br>
>> > command to the originator and do invoke on primary owner + acquire local<br>
>> > lock + remove entry from L1 + release local lock<br>
>><br>
>> I do like this and think it would be simpler for non-tx. However this<br>
>> will still not be as performant I wouldn't think (just a gut feeling).<br>
><br>
><br>
> It would indeed be less performant in some cases, e.g. if we have two get<br>
> commands for a remote key at the same time and the key doesn't exist, my<br>
> proposal would request the key twice.<br>
><br>
> The writes to L1 *should* be rare enough that the extra lock contention<br>
> there would be minimal. There is a chance that an invalidation will be stuck<br>
> waiting for a remote get, so it will again be slower than your approach.<br>
><br>
>> However I am more than certain it will be slower when collisions<br>
>> occur, which is more likely with lock striping.<br>
>><br>
><br>
> I'm not that worried about lock striping - the user can always increase the<br>
> number of stripes to get less contention. And I hope they know better than<br>
> to use lock striping and transactions with multiple keys, or they'll get<br>
> deadlocks with or without L1 enabled.<br>
<br>
</div></div>I didn't think of the deadlock problem with tx and lock striping. I<br>
am wondering should we log a warning message so that user's get an<br>
explicit warning?<br></blockquote><div><br>Yeah, I think a warning would be nice - especially in the documentation, it looks way too innocuous.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">><br>
>><br>
>> I don't think this however is feasible for tx caches.<br>
>><br>
>> Optimistic - we don't want to have gets being blocked when the tx is<br>
>> being committed - that could be quite a bit of a performance hit<br>
>> especially if it has to do a 2 phase commit. Updating the L1 on a<br>
>> remoteGetBeforeWrites could help remedy this (since get wouldn't need<br>
>> to lock), however then you would be almost implementing what I have<br>
>> and still have locking overhead. Maybe this would occur so<br>
>> infrequently it wouldn't matter though?<br>
>><br>
><br>
> Yes, I would propose removing the L1 commit in EntryWrappingInterceptor as<br>
> well, and moving it to the L1 interceptor - immediately after the value was<br>
> retrieved from the remote owner. The local lock would also be released<br>
> immediately.<br>
><br>
> The part I wanted to avoid from your proposal was<br>
> L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to<br>
> understand. I guess I'd find it more approachable all the synchronization<br>
> (the map of L1ReadSynchronizers and the interaction with it) would be in a<br>
> separate class, with its own tests, and the L1 interceptor would only know<br>
> about the high-level stuff. Like the LockManager also hides a map of locks;<br>
> not like the LockManager unit tests, because they're kind of lacking, but<br>
> LockManager has the advantage that almost all the tests touch it indirectly.<br>
><br>
<br>
</div>True, I agree I like it as a separate class better - much better<br>
separation. Maybe I can move that over and add tests and see if that<br>
makes it more clear? If not we can still always use the locking<br>
approach.<br></blockquote><div><br>+1<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">>><br>
>> Pessimistic - I haven't looked into pessimistic closely yet, but in<br>
>> that case I don't think it ever acquires the local locks so it would<br>
>> have to acquire a remote lock on a get, which would be pretty<br>
>> disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK<br>
>> enabled). Making this to acquire local locks might work as well, but<br>
>> then would still have the same issue as optimistic when committing - I<br>
>> really am not as familiar with Pessimistic to say for sure.<br>
>><br>
><br>
> Nope, acquiring a local lock should be enough. We don't want to prevent<br>
> other txs from modifying the key at all, we just want to delay the<br>
> invalidation commands triggered by those modifications on the local node.<br>
> And the lock would only be held for the duration of the remote get + L1<br>
> update, not all the way to the commit.<br>
><br>
</div>Okay, cool. I didn't know for sure.<br>
<div class="HOEnZb"><div class="h5">>><br>
>> ><br>
>> > All the lock acquisitions would use a regular timeout, not a try lock<br>
>> > approach.<br>
>> In this case they could, yes.<br>
>> ><br>
>> >><br>
>> >> > 2. By default L1 invalidations are sent as multicasts, so I'm not<br>
>> >> > sure<br>
>> >> > ISPN-3273 really matters here. BTW, I wonder if we have a check to<br>
>> >> > only send<br>
>> >> > L1 invalidations from one node if the threshold is 0...<br>
>> >> I agree that is the default, but we should support the operation,<br>
>> >> although it doesn't matter for this discussion. Also I am curious as<br>
>> >> to why multicast for L1 isn't set to say 2 by default? It seems<br>
>> >> wasteful to send a multicast to all members that they process when<br>
>> >> only 1 would do anything about it. Do you know why this is like that?<br>
>> ><br>
>> ><br>
>> > I suppose it's because we don't have good perf numbers for different L1<br>
>> > invalidation threshold numbers...<br>
>> ><br>
>> > The problem is, we don't have a way to count all the requestors of a key<br>
>> > in<br>
>> > the cluster, so it's reasonably likely that with a threshold of 2 you'd<br>
>> > get<br>
>> > 1 unicast invalidation from one owner + 1 multicast invalidation from<br>
>> > the<br>
>> > other owner, making it less efficient than a single multicast<br>
>> > invalidation.<br>
>><br>
>> It is a nitpick anyways, and really shouldn't make that big of a<br>
>> difference.<br>
>><br>
>><br>
>> ><br>
>> >> ><br>
>> >> > 3a. Right, for put commands we can't hold the local lock while<br>
>> >> > executing the remote put, or we'll have a deadlock. But I think a<br>
>> >> > shorter<br>
>> >> > lock, held only after the remote put completed (or after the lock on<br>
>> >> > the<br>
>> >> > primary owner was acquired, with txs) should work.<br>
>> >> Same point under 1<br>
>> ><br>
>> ><br>
>> > I don't see how we could get a deadlock if we don't hold the local lock<br>
>> > during the remote write invocation.<br>
>><br>
>> Agree.<br>
>><br>
>> ><br>
>> >><br>
>> >> ><br>
>> >> > 3b. We'd also have an ownership check before, so we'd only serialize<br>
>> >> > the get commands that need to go remotely for the same key. I think<br>
>> >> > it would<br>
>> >> > be almost the same as your solution (although it does have one ?<br>
>> >> > disadvantage - if the key doesn't exist in the cache, all the get<br>
>> >> > commands<br>
>> >> > will go remotely). The number of L1 writes should be very small<br>
>> >> > compared to<br>
>> >> > the number of L1 reads anyway, otherwise it would be more efficient<br>
>> >> > to get<br>
>> >> > the key from the owner every time.<br>
>> >> You are saying an optimization for owner nodes so they don't do the<br>
>> >> "corralling" for keys they own? I like that. Also I don't think it<br>
>> >> has the disadvantage, it only does remotes it if isn't an owner.<br>
>> ><br>
>> ><br>
>> > I meant your corralling strategy means if you have 2 concurrent get<br>
>> > commands<br>
>> > and one of them retrieves a null from the entry owners, the other<br>
>> > command<br>
>> > will return null directly. With regular locking, the other command<br>
>> > wouldn't<br>
>> > find anything in L1 and it would do another remote get.<br>
>> ><br>
>> > I don't think there's any disadvantage in skipping the corralling for<br>
>> > key<br>
>> > owners, in fact I think we need to skip it if the key already exists in<br>
>> > L1,<br>
>> > too.<br>
>> +1<br>
>> ><br>
>> >><br>
>> >> ><br>
>> >> > It would be nice to agree on what guarantees we want to provide for<br>
>> >> > L1<br>
>> >> > invalidation in non-tx caches, I'm not sure if we can do anything to<br>
>> >> > prevent<br>
>> >> > this scenario:<br>
>> >> Actually this scenario doesn't occur with non-tx since writes don't<br>
>> >> update the L1 with their value, they just invalidate. Tx caches are<br>
>> >> fine with this because they acquire the primary owner lock for the<br>
>> >> duration of the write including the L1 update so you can't have this<br>
>> >> ordering.<br>
>> ><br>
>> ><br>
>> > Sounds good.<br>
>> ><br>
>> >><br>
>> >> ><br>
>> >> > A initiates a put(k, v1) to the primary owner B<br>
>> >> > B performs the put(k, v1), invalidates every non-owner and returns<br>
>> >> > B performs another put(k, v2), invalidating every non-owner<br>
>> >> > A receives the result from B and puts k=v1 in its L1<br>
>> >><br>
>> >> @pruivo:<br>
>> >><br>
>> >> > The invalidation does not need to wait for the remote get. When you<br>
>> >> > receive an invalidation, you can mark the current remote get invalid.<br>
>> >> > The<br>
>> >> > invalidation command can return immediately and the remote get can be<br>
>> >> > repeated. Also, it removes the key from data container (if exists)<br>
>> >> Dan hit it right in the head. Unfortunately there is no guarantee the<br>
>> >> cancellation can work properly, so it is a best effort and if not wait<br>
>> >> until we know we will invalidate properly.<br>
>> >> > The writes can update the L1 through your L1Synchronized by adding a<br>
>> >> > simple method like updateL1(newValue). The blocking threads will<br>
>> >> > return<br>
>> >> > immediately the new value and they don't need to wait for the reply.<br>
>> >> Non tx cache write operations aren't safe to update L1 with the value<br>
>> >> since they don't acquire the owning lock while updating the L1, which<br>
>> >> means you could have interleaved writes. Which is the primary reason<br>
>> >> I rejected ISPN- 3214. For tx caches we can't do this since the<br>
>> >> update has to take part of the tx, which the get would be updating the<br>
>> >> L1 outside of a transaction.<br>
>> >> > I see... However, I think that all the events should synchronize at<br>
>> >> > some<br>
>> >> > point (update by remote get, update by local put and invalidation).<br>
>> >> I was hoping that would cover this. Other than the outstanding issue<br>
>> >> in ISPN-2965.<br>
>> >><br>
>> >> On Thu, Jun 27, 2013 at 9:18 AM, William Burns <<a href="mailto:mudokonman@gmail.com">mudokonman@gmail.com</a>><br>
>> >> wrote:<br>
>> >> > First off I apologize for the length.<br>
>> >> ><br>
>> >> > There have been a few Jiras recently that have identified L1<br>
>> >> > consistency<br>
>> >> > issues with both TX and non TX sync caches. Async caches with L1<br>
>> >> > have<br>
>> >> > their<br>
>> >> > own issues as well, but I only wanted to talk about sync caches.<br>
>> >> ><br>
>> >> > <a href="https://issues.jboss.org/browse/ISPN-3197" target="_blank">https://issues.jboss.org/browse/ISPN-3197</a><br>
>> >> > <a href="https://issues.jboss.org/browse/ISPN-2965" target="_blank">https://issues.jboss.org/browse/ISPN-2965</a><br>
>> >> > <a href="https://issues.jboss.org/browse/ISPN-2990" target="_blank">https://issues.jboss.org/browse/ISPN-2990</a><br>
>> >> ><br>
>> >> > I have proposed a solution in<br>
>> >> > <a href="https://github.com/infinispan/infinispan/pull/1922" target="_blank">https://github.com/infinispan/infinispan/pull/1922</a> which should start<br>
>> >> > L1<br>
>> >> > consistency down the right track. There are quite a few comments on<br>
>> >> > it<br>
>> >> > if<br>
>> >> > you want to look into it more, but because of that I am moving this<br>
>> >> > to<br>
>> >> > the<br>
>> >> > dev mailing list.<br>
>> >> ><br>
>> >> > The key changes in the PR are the following (non-tx):<br>
>> >> ><br>
>> >> > 1. Concurrent reads for a key that can retrieve a remote value are<br>
>> >> > "corralled" into a single thread of execution for that given key.<br>
>> >> > This<br>
>> >> > would reduce network traffic with concurrent gets for the same key.<br>
>> >> > Note<br>
>> >> > the "corralling" only happens on a per key basis.<br>
>> >> > 2. The single thread that is doing the remote get would update the L1<br>
>> >> > if<br>
>> >> > able (without locking) and make available the value to all the<br>
>> >> > requests<br>
>> >> > waiting on the get.<br>
>> >> > 3. Invalidations that are received would first check to see if there<br>
>> >> > is<br>
>> >> > a<br>
>> >> > current remote get occurring for it's keys. If there is it will<br>
>> >> > attempt<br>
>> >> > to<br>
>> >> > cancel the L1 write(s) before it occurs. If it cannot cancel the L1<br>
>> >> > write,<br>
>> >> > then it must also wait on the current remote get completion and<br>
>> >> > subsequently<br>
>> >> > run the invalidation. Note the cancellation would fail when the<br>
>> >> > remote<br>
>> >> > get<br>
>> >> > was done and it is in the middle of updating the L1, so this would be<br>
>> >> > very<br>
>> >> > small window.<br>
>> >> > 4. Local writes will also do the same thing as the invalidation with<br>
>> >> > cancelling or waiting. Note that non tx local writes only do L1<br>
>> >> > invalidations and don't write the value to the data container.<br>
>> >> > Reasons<br>
>> >> > why<br>
>> >> > I found at <a href="https://issues.jboss.org/browse/ISPN-3214" target="_blank">https://issues.jboss.org/browse/ISPN-3214</a><br>
>> >> > 5. Writes that require the previous value and don't have it in the L1<br>
>> >> > would<br>
>> >> > also do it's get operations using the same "corralling" method.<br>
>> >> ><br>
>> >> > 4/5 are not currently implemented in PR.<br>
>> >> ><br>
>> >> > This approach would use no locking for non tx caches for all L1<br>
>> >> > operations.<br>
>> >> > The synchronization point would be done through the "corralling"<br>
>> >> > method<br>
>> >> > and<br>
>> >> > invalidations/writes communicating to it.<br>
>> >> ><br>
>> >> > Transactional caches would do almost the same thing as non-tx. Note<br>
>> >> > these<br>
>> >> > changes are not done in any way yet.<br>
>> >> ><br>
>> >> > 1. Gets would now update the L1 immediately after retrieving the<br>
>> >> > value<br>
>> >> > without locking, but still using the "corralling" technique that<br>
>> >> > non-tx<br>
>> >> > does. Previously the L1 update from a get was transactional. This<br>
>> >> > actually<br>
>> >> > would remedy issue [1]<br>
>> >> > 2. Writes currently acquire the remote lock when committing, which is<br>
>> >> > why tx<br>
>> >> > caches are able to update the L1 with the value. Writes would do the<br>
>> >> > same<br>
>> >> > cancellation/wait method as non-tx.<br>
>> >> > 3. Writes that require the previous value and don't have it in the L1<br>
>> >> > would<br>
>> >> > also do it's get operations using the same method.<br>
>> >> > 4. For tx cache [2] would also have to be done.<br>
>> >> ><br>
>> >> > [1] -<br>
>> >> ><br>
>> >> ><br>
>> >> > <a href="https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780" target="_blank">https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780</a><br>
>> >> > [2] - <a href="https://issues.jboss.org/browse/ISPN-1540" target="_blank">https://issues.jboss.org/browse/ISPN-1540</a><br>
>> >> ><br>
>> >> > Also rehashing is another issue, but we should be able to acquire the<br>
>> >> > state<br>
>> >> > transfer lock before updating the L1 on a get, just like when an<br>
>> >> > entry<br>
>> >> > is<br>
>> >> > committed to the data container.<br>
>> >> ><br>
>> >> > Any comments/concerns would be appreciated.<br>
>> >> ><br>
>> >> > Thanks,<br>
>> >> ><br>
>> >> > - Will<br>
>> >><br>
>> >> _______________________________________________<br>
>> >> infinispan-dev mailing list<br>
>> >> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>> >> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>> ><br>
>> ><br>
>> ><br>
>> > _______________________________________________<br>
>> > infinispan-dev mailing list<br>
>> > <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>> > <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
>> _______________________________________________<br>
>> infinispan-dev mailing list<br>
>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> infinispan-dev mailing list<br>
> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
_______________________________________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
</div></div></blockquote></div><br>