On 23 Oct 2009, at 17:42, Mircea Markus wrote:
Hi Manik,
While the fix I made for EntityWrapper fixed the cache in local mode,
the replicated mode[1] (pretty sure dist is the same) is far from
working.
The issue is the following:
node1.put(k,v2) :: WL(k1) -> releaseLock(k1) -> replicate(k1) ->
incoming_thread_performs put (k, *v2*)
node2.put(k,v2) :: WL(k2) -> releaseLock(k2) -> replicate(k2) ->
incoming_thread_performs put (k, *v1*)
now in this scenario both operation succeed! This is caused by the
fact that lock interceptor is *after* repl interceptor in the chain,
so when repl is triggered the key is no longer locked.
Uh oh, that is very much a bug. The Repl interceptor *must* be after
the lock interceptor.
In other words, if the puts happen at the same time, on the same key
then both might succeed, leaving the cluster in an inconsistent state.
IMO, the correct approach would be to keep the lock while doing the
replication. This would reduce the concurrency though, as locks would
be held for longer amounts of time. But the data consistency would
increase.
This is the correct approach though. Could you please swap the
interceptor order and check your tests? There shouldn't be any other
regressions.
(this would be an easy fix for Repl, but dist relies on
interceptor's order).
How does DIST rely on this order? AFAICT, DIST too should work with
the DistributionInterceptor appearing after the LockingInterceptor.
I've also tried doing the call within a tx, but there's a
deadlock -
I'll investigate this on Mon.
I expect you will see the same problem even with transactions. Either
way, the correct solution is to properly order the interceptors.
Thanks!
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org