On 29/07/14 23:14, Dan Berindei wrote:
On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero <sanne(a)infinispan.org
<mailto:sanne@infinispan.org>> wrote:
This is a nasty problem and I also feel passionately we need to get
rid of it ASAP.
I did have the same problems many times, and we discussed this also in
Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this.
You don't need TO, and you don't need to lock at all as long as you
guarantee the backup owners are getting the number with some
monotonicity sequence attached to it,
all that backup owners need to do is ignore incoming commands which
are outdated.
This is more or less what TOA does - assign a monotonic sequence number
to txs, and only apply them after all the previous txs in the sequence
have been applied. The problem is getting that monotonic sequence when
there are multiple originators and multiple primary owners also requires
some extra RPCs.
Yes - for a TX involving multiple keys, TOA's probably the way to go.
However, for non-TXs caches and accessing single (or only few) keys,
TOA's probably overkill.
As long as we move the sync update RPC out of the lock scope, I'm fine
with whatever solution you guys come up with.
Another aspect is that the "user thread" on the primary
owner needs to
wait (at least until we improve further) and only proceed after ACK
from backup nodes, but this is better modelled through a state
machine. (Also discussed in Farnborough).
To be clear, I don't think keeping the user thread on the originator
blocked until we have the write confirmations from all the backups is a
problem - a sync operation has to block, and it also serves to
rate-limit user operations.
I agree; sync mode implies user threads are blocking until an operation
has completed.
The problem appears when the originator is not the primary owner,
and
the thread blocking for backup ACKs is from the remote-executor pool (or
OOB, when the remote-executor pool is exhausted).
It's also conceptually linked to:
-
https://issues.jboss.org/browse/ISPN-1599
As you need to separate the locks of entries from the effective user
facing lock, at least to implement transactions on top of this model.
I think we fixed ISPN-1599 when we changed passivation to use
DataContainer.compute(). WDYT Pedro, is there anything else you'd like
to do in the scope of ISPN-1599?
I expect this to improve performance in a very significant way, but
it's getting embarrassing that it's still not done; at the next face
to face meeting we should also reserve some time for retrospective
sessions.
Implementing the state machine-based interceptor stack may give us a
performance boost, but I'm much more certain that it's a very complex,
high risk task... and we don't have a stable test suite yet :)
Yes - this is something major, let's add it to the agenda
--
Bela Ban, JGroups lead (
http://www.jgroups.org)