[infinispan-dev] Asynchronous cache's "void put()" call expectations changed from 6.0.0 to 6.0.1/7.0

Mon Mar 2 08:16:14 EST 2015

On Mon, Mar 2, 2015 at 2:11 PM, Radim Vansa <rvansa at redhat.com> wrote:
> On 03/02/2015 12:08 PM, Dan Berindei wrote:
>> The AsyncCache interface already has an xxxAsync method for every
>> write operation in the Cache interface. Those operations really behave
>> just like the sync operations, except they return a NotifyingFuture -
>> basically what .
>
> But the implementation is sub-optimal now, as it only delegates the sync
> operation to threadpool, requiring context switch. But this will go away
> with the reactive-Infinispan design as discussed in Berlin. And
> NotifyingFeature will be replaced with standard CompletionFuture. I
> though this was already agreed upon.
>

Sure, it's not an optimal implementation, I was just trying to point
out that we already have a proper async API. We can improve it a bit
once we move to Java 8, but not significantly.

>>
>> The FORCE_ASYNCHRONOUS flag is different, because you have no way to
>> find out if the operation was successful. In fact, it behaves as if
>> the cache was REPL_ASYNC. This can indeed cause some confusion, as the
>> async operation can break consistency guarantees for concurrent sync
>> operations. I'm all for removing REPL_ASYNC and the FORCE_ASYNCHRONOUS
>> flag, but I think users will complain about their removal because
>> putAsync is slower and they don't care/know about potential
>> consistency problems.
>
> Slower? You mean that you can achieve lower throughput, right? When you
> declare that you don't need any ack (you should get a Future that would
> throw an exception on get() etc.) you don't have to send the
> confirmation from primary to originator. Do you also omit confirmations
> from backup to primary in *_ASYNC mode? I have a vague idea that this
> can work with relying on JGroups' primary -> backup ordering of
> messages, but during topology change this could be messed up (leading to
> non-consistent cache until the entry is overridden). If the users insist
> on fast and dirty cache, the configuration option shouldn't be <sync
> .../> || <async ... /> but rather <inconsistency allowed="true||false"/>
> - this would make the behaviour more explicit.

Yes, REPL_ASYNC/DIST_ASYNC currently rely on JGroups delivering
messages in order. And it's not just that they don't require any ack
messages, but they also need to keep the lock on the primary for a
much shorter time (just until the command is sent to the backups vs.
until the command is ack-ed by all backups).

Of course, as you pointed out, this breaks when the primary owner
changes. That's why I suggested that if we keep the async replication
option, we stop using lock delegation for it and we instead send the
command directly from the originator to all the owners (basically the
model we had before 5.2).

However, DIST_SYNC also allows inconsistencies in some cases,
especially when it comes to the return value of atomic operations. So
<inconsistency allowed="false"/> wouldn't be quite right. <unsafe
asyncReplication="false|true"/> looks a little better to me.

>
> Radim
>
>>
>> There's also the question of putForExternalRead, which is currently
>> defined to always replicate the value asynchronously. If REPL_ASYNC is
>> removed, it doesn't make sense to keep the async replication code only
>> for putForExternalRead, so it would have to behave like putAsync
>> (maybe with an extra CACHE_MODE_LOCAL put).
>>
>> Cheers
>> Dan
>>
>>
>> On Mon, Mar 2, 2015 at 12:20 PM, Radim Vansa <rvansa at redhat.com> wrote:
>>> I second Galder's opinion that sync/async should be rather a matter of
>>> API than configuration, because you have to deal with that in code.
>>> Therefore, when planning a behaviour of any operation, I would suggest
>>> that it should be possible to host sync and async operations (currently
>>> forcable through a flag) with the proper semantics - different paths of
>>> messages would complicate this a lot (e.g. the guarantee that eventually
>>> all owners share the same value).
>>>
>>> The most natural semantics would be IMO that async operations return
>>> without any guarantee *when* this operation will be applied (and even
>>> *if* since it can silently fail with error), with optional way to find
>>> out that it has been applied (as soon as we add the Java 8 API). And the
>>> effect should be the same as for sync version. It's a subject for
>>> discussion whether the retry should be done upon topology change (but it
>>> needs to be transparent to user anyway).
>>>
>>> My 2c
>>>
>>> Radim
>>>
>>> On 03/02/2015 09:55 AM, Dan Berindei wrote:
>>>> Reviving the thread...
>>>>
>>>> Paul, this is a little more complicated than just the return value of
>>>> the put. In Infinispan prior to 6.0.1, this would have always worked
>>>> with a replicated asynchronous cache:
>>>>
>>>> cache.put(k, v); // return value ignored
>>>> v1 = cache.get(k);
>>>> assertEquals(v, v1);
>>>>
>>>> The behaviour of Infinispan prior to 6.0.1 was to write the value to
>>>> the local cache synchronously, and only replicate the value to the
>>>> other owners asynchronously.
>>>> Of course, if the cache was distributed and the originator was not an
>>>> owner, the assertion could have failed anyway, because the remote get
>>>> could be executed before the remote put.
>>>> The get(k) is not guaranteed to return the value inserted by the
>>>> put(k, v), but it did work most of the time.
>>>>
>>>> Since 6.0.1, the originator doesn't update the local value
>>>> synchronously any more, unless it is the primary owner. If it is a
>>>> backup owner, it will send the update asynchronously to the primary
>>>> owner, which will then update the backup owners (including the
>>>> originator) synchronously, while holding the lock. This was supposed
>>>> to change in 5.3.0, as replicated mode was already using lock
>>>> delegation in 5.3.0, but EntryWrappingInterceptor had an outdated
>>>> check that meant the originator did both the synchronous update and
>>>> the asynchronous one. The fix in 6.0.1 was just to remove the
>>>> synchronous update.
>>>>
>>>> Personally I think the pre-6.0.1 behaviour was a bit dangerous,
>>>> because the get would return the updated value *almost* all of the
>>>> time. So there was a chance that tests would work, and the app would
>>>> only fail in production. putIfAbsent will also behave more like an
>>>> atomic operation with 6.0.1+, at least as long as the primary owner of
>>>> the key doesn't change.
>>>>
>>>> That's not to say the current behaviour is perfect. I'm open to
>>>> changing REPL_ASYNC back to updating the local data container first
>>>> (if owner) and then sending the write command asynchronously to all
>>>> the members in parallel, like we did in 5.2.0 and before.
>>>>
>>>> We could also change DIST_ASYNC to behave the same, because the
>>>> current behaviour is a little too close to what putAsync does in
>>>> DIST_SYNC mode, and people using async replication should be ok with
>>>> nodes applying modifications in a different order (it happens on
>>>> primary owner change anyway).
>>>>
>>>> However, I'm not sure what we should do for atomic operations like
>>>> replace(key, oldValue, newValues). I would suggest throwing a
>>>> UnsupportedOperationException, because the operation is very unlikely
>>>> to be atomic. It's true that PFER is also specified as behaving the
>>>> same as putIfAbsent, but the PFER check is going to be on the
>>>> originator only so we can just remove that bit from the Javadoc.
>>>>
>>>> Another option would be to change PFER only to behave like in 5.3.0
>>>> and 6.0.0. We could say PFER performs two put operations, one that's
>>>> local-only and one that's replicated but asynchronous (updating the
>>>> value on the originator a second time). Or we could say PFER always
>>>> updates the value on the local node only...
>>>>
>>>> Cheers
>>>> Dan
>>>>
>>>> On Tue, Sep 2, 2014 at 3:19 PM, Paul Ferraro <paul.ferraro at redhat.com
>>>> <mailto:paul.ferraro at redhat.com>> wrote:
>>>>
>>>>      ----- Original Message -----
>>>>      > From: "Galder Zamarreño" <galder at redhat.com
>>>>      <mailto:galder at redhat.com>>
>>>>      > To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org
>>>>      <mailto:infinispan-dev at lists.jboss.org>>, "Paul Ferraro"
>>>>      <paul.ferraro at redhat.com <mailto:paul.ferraro at redhat.com>>
>>>>      > Sent: Monday, September 1, 2014 11:08:45 AM
>>>>      > Subject: Asynchronous cache's "void put()" call expectations
>>>>      changed from 6.0.0 to 6.0.1/7.0
>>>>      >
>>>>      > Hi all,
>>>>      >
>>>>      > @Paul, this might be important for WF if using async repl caches
>>>>      (the same I
>>>>      > think applies to distributed async caches too)
>>>>
>>>>      Luckily, Dan warned me that of this behavioral change well in advance.
>>>>      Any time we need a reliable return value from a Cache.put(...), we
>>>>      use Flag.FORCE_SYNCHRONOUS so that the same code will work for
>>>>      sync and async caches alike.
>>>>
>>>>
>>>>      > Today I’ve been trying to upgrade Infinispan version in
>>>>      Hibernate master from
>>>>      > 6.0.0.Final to 7.0.0.Beta1. Overall, it’s all worked fine but
>>>>      there’s one
>>>>      > test that has started failing.
>>>>      >
>>>>      > Essentialy, this is a clustered test for a repl async cache (w/
>>>>      cluster cache
>>>>      > loader) where a non-owner cache node does put() and immediately,
>>>>      on the same
>>>>      > cache, it calls a get(). The test is failing because the get()
>>>>      does not see
>>>>      > the effects of the put(), even if both operations are called on
>>>>      the same
>>>>      > cache instance.
>>>>      >
>>>>      > According to Dan, this should have been happening since [1] was
>>>>      implemented,
>>>>      > but it’s really started happening since [2] when lock delegation
>>>>      was enabled
>>>>      > for replicated caches
>>>>      (EntryWrappingInterceptor.isUsingLockDelegation is now
>>>>      > true whereas in 6.0.0 it was false).
>>>>      >
>>>>      > Not sure we set expectations in this regard, but clearly it’s
>>>>      big change in
>>>>      > terms of expectations on when “void put()” completes for async
>>>>      repl caches.
>>>>      > I’m not sure how we should handle this, but it definitely needs some
>>>>      > discussion and adjuts documentation/javadoc if needed. Can we do
>>>>      something
>>>>      > differently?
>>>>      >
>>>>      > Indepent of how we resolve this, this is the result of once
>>>>      again of trying
>>>>      > to shoehole async behaviour into sync APIs. Any async caches
>>>>      (DIST, INV,
>>>>      > REPL) should really be accessed exclusively via the AsyncCache
>>>>      API, where
>>>>      > you can return quickly and use the future, and any listener to
>>>>      attach to it
>>>>      > (a bit ala Java8’s CompletableFuture.map lambda calls) as a way
>>>>      to signal
>>>>      > that the operation has completed, and then you have an API and
>>>>      cache mode
>>>>      > that make sense and is consistent with how async APIs work.
>>>>      >
>>>>      > Right now, when a repl async cache’s “void put()” call is not
>>>>      very well
>>>>      > defined. Does it return when message has been put on the
>>>>      network? What
>>>>      > impact does it have in the local cache contents?
>>>>      >
>>>>      > Also, a very big problem of the change of behaviour is that if
>>>>      left like
>>>>      > that, you are forcing users to code differently, using the same
>>>>      “void put()”
>>>>      > API depending on the configuration (whether async/sync). As
>>>>      clearly shown by
>>>>      > the issue above, this is very confusing. It’s a lot more logical
>>>>      IMO, and
>>>>      > I’ve already sent an email on this very same topic [3] back in
>>>>      January, that
>>>>      > whether a cache is sync or async should be based purely on the
>>>>      API used and
>>>>      > forget about the static configuration flag on whether the cache
>>>>      is async or
>>>>      > sync.
>>>>
>>>>      I would agree would this last statement. Consistent semantics are
>>>>      a good thing.
>>>>      If you do change this, however, just let me know well in advance.
>>>>
>>>>      > Cheers,
>>>>      >
>>>>      > [1] https://issues.jboss.org/browse/ISPN-2772
>>>>      > [2] https://issues.jboss.org/browse/ISPN-3354
>>>>      > [3]
>>>>      http://lists.jboss.org/pipermail/infinispan-dev/2014-January/014448.html
>>>>      > --
>>>>      > Galder Zamarreño
>>>>      > galder at redhat.com <mailto:galder at redhat.com>
>>>>      > twitter.com/galderz <http://twitter.com/galderz>
>>>>      >
>>>>      >
>>>>
>>>>      _______________________________________________
>>>>      infinispan-dev mailing list
>>>>      infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
>>>>      https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Radim Vansa <rvansa at redhat.com>
>>> JBoss Performance Team
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss Performance Team
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev