[infinispan-dev] PutForExternalRead consistency

Galder Zamarreño galder at redhat.com
Wed Nov 27 10:30:44 EST 2013


On Nov 22, 2013, at 4:52 PM, William Burns <mudokonman at gmail.com> wrote:

> You are correct in the current way PFER works that a lock obtained for
> only a brief moment while it sends the replication async, but returns
> right away.  That is what my changes would change.
> 
> Thinking about this more I was thinking we could even remove all of
> the extra PFER logic in all of our interceptors and commands.
> 
> Instead PFER would do the following:
> 
> 1. Suspend the current tx if applicable (already does this)
> 2. Send the putIfAbsent async using the async executor, just like
> putIfAbsentAsync method (this would be different than
> FORCE_ASYNCHRONOUS as that only affects remote invocations).  Note
> this means the put would be done synchronously and would obtain all
> locks as normal which would guarantee consistency between primary and
> backups.
> 3. Only pass along the following flags: IGNORE_PREVIOUS_VALUE,
> FAIL_SILENTLY and ZERO_LOCK_ACQUISITION_TIMEOUT.  Note we wouldn't
> need a PUT_FOR_EXTERNAL_READ (not quite sure if we can get away with
> this in tx or not - would have to test it out)
> 4. Optionally return a Future object that the user could block on
> confirming the PFER completed.  Note there is no way to way to confirm
> if it wrote a value or not though due to FAIL_SILENTLY.
> 
> As Dan pointed out to me this will could slow down concurrency in REPL
> and DIST since the replication is now synchronous since it will hold
> onto the lock longer.  This seems fair to me since you at least can
> guarantee your cache is consistent still.
> 
> Also this approach an end user can still do LOCAL only PFER by setting
> the CACHE_MODE_LOCAL flag if needed.
> 
> I hope this cleared it up a bit more.  WDYT or is there something I am
> overlooking?

>From a 2LC use case perspective, to keep performance high, I'd continue with same strategy we have right now. Fire and forget the PFER. So, if there's an error in the PFER, it still assumes it's been "cached" from a stats perspective, though the contents really tell the truth for the application.

With the new API suggested above, I'd essentially forget about waiting for the future, since that would have a massive impact of the perfomance, particularly for non-local modes. Since these could compete with normal put operations (iow, entity updates) , then I'd go for LOCAL only.

The performance might go down for situations when there are a lot of PFER and put calls competing, but that would at least mean that the cache is consistent. That's probably the price to pay here.

@William, if you implement this, I can run it through a stress test I have for Hibernate putFromLoad [2], which gives us some indication of performance impact.

Related to this, a while back considered implementing [1]. In some cases it might make sense to do something with the return. From a 2LC perspective, I'd not wait to see what the boolean returns. I've got a workaround in the code for this, so I don't need this immediately. So, if we're gonna return NotifiableFuture from now on, better return a NotifiableFuture<Boolean> for those that want to know if the PFER did actually put anything or not.

Cheers,

[1] https://issues.jboss.org/browse/ISPN-1986
[2] https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/test/java/org/hibernate/test/cache/infinispan/stress/PutFromLoadStressTestCase.java

> 
> - Will
> 
> On Fri, Nov 22, 2013 at 10:39 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>> I think I need to clarify my earlier email a bit: the problem I'm worried
>> about is that we could have a thread execute a putForExternal*Read*(k, v1),
>> then a put(k, v2), then a remove(k), and in the end leaving k = v1 in the
>> cache. (Without the remove(k) we'd be ok, because PFER uses putIfAbsent()
>> under the hood.)
> 
> Fixed the little typo.  I personally think that if you are using PFER
> you are accepting these kind of things could happen.  With the Future
> returned they could also block on that and know for sure :)
> 
>> 
>> This is quite different from the problem that Pedro raised, that of
>> different owners ending up with different values for the same key. Will's
>> suggestion to implement PFER as a regular put from a background thread does
>> fix that problem.
>> 
>> Writing the value only locally like Galder suggested would also address my
>> concern, but at the expense of extra misses from the cache - especially in
>> DIST mode. Hence my proposal to not support PFER in DIST mode at all.
> 
> You could still get into weird issues with REPL with this though too.
> Since you have a cluster with possibly different values in each node
> you would never know which node has which value.  A new node could
> join and some nodes have the newer some have the older etc.
> 
> I would assume (I am probably wrong, since people use things however
> they want) that when end users use PFER they do so in a way that they
> only ever call PFER on that cache and then use size based eviction to
> remove elements and free memory.  In that case you don't have these
> consistency problems that Pedro mentioned :)
> 
>> 
>> 
>> 
>> On Fri, Nov 22, 2013 at 3:45 PM, Dan Berindei <dan.berindei at gmail.com>
>> wrote:
>>> 
>>> That doesn't sound right, we don't keep any lock for the duration of the
>>> replication. In non-tx mode, we have to do a RPC to the primary owner before
>>> we acquire any key. So there's nothing stopping the PFER from writing its
>>> value after a regular (sync) put when the put was initiated after the PFER.
>>> 
>>> 
>>> On Fri, Nov 22, 2013 at 2:49 PM, William Burns <mudokonman at gmail.com>
>>> wrote:
>>>> 
>>>> I wonder if we are over analyzing this.  It seems the main issue is
>>>> that the replication is done asynchronously.  Infinispan has many ways
>>>> to be make something asynchronous.  In my opinion we just chose the
>>>> wrong way.  Wouldn't it be easier to just change the PFER to instead
>>>> of passing along the FORCE_ASYNCHRONOUS flag we instead just have the
>>>> operation performed asynchronous using putIfAbsentAsync ?  This way
>>>> the lock is held during the duration of the replication and should be
>>>> consistent with other operations.  Also the user can regain control
>>>> back faster as it doesn't even have to process the local interceptor
>>>> chain.  We could also change the putForExternalRead method declaration
>>>> to also return a NotifiableFuture<Void> or something so they know when
>>>> the operation is completed (if they want).
>>>> 
>>>> - Will
>>>> 
>>>> On Thu, Nov 21, 2013 at 9:54 AM, Dan Berindei <dan.berindei at gmail.com>
>>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño <galder at redhat.com>
>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Nov 18, 2013, at 12:42 PM, Dan Berindei <dan.berindei at gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño
>>>>>>> <galder at redhat.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On Nov 14, 2013, at 1:20 PM, Pedro Ruivo <pedro at infinispan.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Simple question: shouldn't PFER ensure some consistency?
>>>>>>>> 
>>>>>>>> I know that PFER is asynchronous but (IMO) it can create
>>>>>>>> inconsistencies
>>>>>>>> in the data. the primary owner replicates the PFER follow by a PUT
>>>>>>>> (PFER
>>>>>>>> is sent async log the lock is released immediately) for the same
>>>>>>>> key,
>>>>>>>> we
>>>>>>>> have no way to be sure if the PFER is delivered after or before in
>>>>>>>> all
>>>>>>>> the backup owners.
>>>>>>>> 
>>>>>>>> comments?
>>>>>>> 
>>>>>>> Assuming that PFER and PUT happen in the same thread, we're normally
>>>>>>> relying on the JGroups sequence of events to send the first, wait no
>>>>>>> response, and then send the second put. That should guarantee order
>>>>>>> in which
>>>>>>> puts are received in the other nodes, but after that yeah, there's a
>>>>>>> risk
>>>>>>> that it could happen. PFER and PUT for a given key normally happen
>>>>>>> in the
>>>>>>> same thread in cache heavy use cases such as Hibernate 2LC, but
>>>>>>> there's no
>>>>>>> guarantee.
>>>>>>> 
>>>>>>> I don't think that's correct. If the cache is synchronous, the PUT
>>>>>>> will
>>>>>>> be sent as an OOB message, and as such it can be delivered on the
>>>>>>> target
>>>>>>> before the previous PFER command. That's regardless of whether the
>>>>>>> PFER
>>>>>>> command was sent as a regular or as an OOB message.
>>>>>> 
>>>>>> ^ Hmmmm, that's definitely risky. I think we should make PFER local
>>>>>> only.
>>>>>> 
>>>>>> The fact that PFER is asynchronous is nice to have. IOW, if you read a
>>>>>> value from a database and you want to store it in the cache for later
>>>>>> read,
>>>>>> the fact that it's replicated asynchronously is just so that other
>>>>>> nodes can
>>>>>> take advantage of the value being in the cache. Since it's
>>>>>> asynchronous some
>>>>>> nodes could fail to apply, but that's fine since you can go to the
>>>>>> database
>>>>>> and re-retrieve it from there. So, making PFER local only would be the
>>>>>> degenerate case, where all nodes fail to apply except the local node,
>>>>>> which
>>>>>> is fine. This is better than having the reordering above.
>>>>>> 
>>>>>> In a chat I had with Dan, he pointed out that having PFER local only
>>>>>> would
>>>>>> be problematic for DIST mode w/ L1 enabled, since the local write
>>>>>> would not
>>>>>> invalidate other nodes, but this is fine because PFER only really
>>>>>> makes
>>>>>> sense for situations where the Infinispan is used as a cache. So, if
>>>>>> the
>>>>>> data is in the DB, you might as well go there (1 network trip), as
>>>>>> opposed
>>>>>> to askign the other nodes for data and the database in the worst case
>>>>>> (2
>>>>>> network trips).
>>>>>> 
>>>>>> PFER is really designed for replication or invalidation use cases,
>>>>>> which
>>>>>> are precisely the ones configured for Hibernate 2LC.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>> 
>>>>> +1 to make PFER local-only in replicated caches, but I now think we
>>>>> should
>>>>> go all the way and disallow PFER completely in dist caches.
>>>>> 
>>>>> I still think having L1 enabled would be a problem, because a regular
>>>>> put()
>>>>> won't invalidate the entry on all the nodes that did a PFER for that
>>>>> key
>>>>> (there are no requestors, and even if we assume that we do a remote get
>>>>> before the PFER we'd still have race conditions).
>>>>> 
>>>>> With L1 disabled, we have the problem that you mentioned: we're trying
>>>>> to
>>>>> read the value from the proper owners, but we never write it to the
>>>>> proper
>>>>> owners, so the hit ratio will be pretty bad. Using the
>>>>> SKIP_REMOTE_LOOKUP
>>>>> flag on reads, we'll avoid the extra RPC in Infinispan, but that will
>>>>> make
>>>>> the hit ratio even worse. E.g. in a 4-nodes cluster with numOwners=2,
>>>>> the
>>>>> hit ratio will never go above 50%.
>>>>> 
>>>>> I don't think anyone would use a cache knowing that its hit ratio can
>>>>> never
>>>>> get above 50%, so we should just save ourselves some effort and stop
>>>>> supporting PFER in DIST mode.
>>>>> 
>>>>> Cheers
>>>>> Dan
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org




More information about the infinispan-dev mailing list