Hi Galder,
I think that this was changed in Infinispan version 5.3 or so :) The
reason for this is that updates even in async cache are applied in the
same order on all owners. If you'd update local node A first to X, and
then asynchronously update the other node B, there could be a concurrent
update to Y on the other node B, and then the cluster would likely end
up with A having Y and B having X, without anything eventually resolving
this. Some locking has to be involved, too, and the algorithm in 5.3
actually did not allow the values to diverge, but caused a deadlock.
In 2LC, this can be eliminated in some cases, though - e.g. if we do
putIfAbsents with the same value, it's safe to apply the value locally
and sent the update asynchronously to the other node. For removals, it's
safe, too. Therefore, I have recently replaced distribution & locking
interceptors with 'optimized' version [1][2].
While I am strong adversary of the *_ASYNC modes in general, I think
that the consistent order of updates should be preserved there. And if
you do an async put to dist cache, you can't be sure that following read
will return the value either (and repl is just read-optimized+failure
resilient case of dist).
Radim
[1]
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinisp...
[2]
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinisp...
On 01/26/2017 01:24 PM, Galder ZamarreƱo wrote:
Hi all,
Forgive me if we've discussed this before (I vaguely remember...), but the current
async semantics always through me off a bit, let me explain:
I've been working on/off on Hibernate 2LC tutorial that demonstrates how to run 2LC
on embedded, Wildfly and Spring set ups, and for each of them, explains how it all works
in local vs clustered mode.
One of the sections involves working with queries, updating an entity that's part of
the query, and seeing how that query gets re-executed from the db. When an entity is
updated, that entity's update timestamp gets updated in a cache, which in a cluster
environment is configured with repl async.
If you have two nodes A and B, it was expected that if you updated the entity in node A,
you'd want to wait a tiny bit to run the query in node B so that the timestamp update
would propagate to node B.
However, recent async semantics work in such way that if you updated the entity in node A
and wanted to execute the query in node A, you still might want to add a little delay...
The reason for that is that the logic changes based on whether the ownership of entity
type key in the update timestamp cache is in node A or node B. If the owner is node A, the
cache is updated directly by the main thread. So you can execute a query on node A
immediately after the update and it'll be fine.
However, if the owner is node B, even if the update was done in node A, node A will only
be updated asynchronously. So, if after calling an update on node A, you do a query on
node A, in this scenario you'd get outdated results for a small period of time. [1]
So, my question here is: can we do anything to make this more predictable from a users
perspective? Or is it just not worth doing it? Or is it just a side effect that we must be
aware off?
Cheers,
[1]
https://gist.github.com/galderz/676f689884969658b01a7695f08dd7a2
--
Galder ZamarreƱo
Infinispan, Red Hat
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team