Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution

Wednesday, 30 July 2014

On 29/07/14 23:14, Dan Berindei wrote:
...

 On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero <sanne(a)infinispan.org
 <mailto:sanne@infinispan.org>> wrote:

     This is a nasty problem and I also feel passionately we need to get
     rid of it ASAP.
     I did have the same problems many times, and we discussed this also in
     Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this.

     You don't need TO, and you don't need to lock at all as long as you
     guarantee the backup owners are getting the number with some
     monotonicity sequence attached to it,
     all that backup owners need to do is ignore incoming commands which
     are outdated.

 This is more or less what TOA does - assign a monotonic sequence number
 to txs, and only apply them after all the previous txs in the sequence
 have been applied. The problem is getting that monotonic sequence when
 there are multiple originators and multiple primary owners also requires
 some extra RPCs. 
Yes - for a TX involving multiple keys, TOA's probably the way to go. 
However, for non-TXs caches and accessing single (or only few) keys, 
TOA's probably overkill.

As long as we move the sync update RPC out of the lock scope, I'm fine 
with whatever solution you guys come up with.

...
     Another aspect is that the "user thread" on the primary
owner needs to
     wait (at least until we improve further) and only proceed after ACK
     from backup nodes, but this is better modelled through a state
     machine. (Also discussed in Farnborough).

 To be clear, I don't think keeping the user thread on the originator
 blocked until we have the write confirmations from all the backups is a
 problem - a sync operation has to block, and it also serves to
 rate-limit user operations. 
I agree; sync mode implies user threads are blocking until an operation 
has completed.

...
 The problem appears when the originator is not the primary owner,
and
 the thread blocking for backup ACKs is from the remote-executor pool (or
 OOB, when the remote-executor pool is exhausted).

     It's also conceptually linked to:
       - https://issues.jboss.org/browse/ISPN-1599
     As you need to separate the locks of entries from the effective user
     facing lock, at least to implement transactions on top of this model.

 I think we fixed ISPN-1599 when we changed passivation to use
 DataContainer.compute(). WDYT Pedro, is there anything else you'd like
 to do in the scope of ISPN-1599?

     I expect this to improve performance in a very significant way, but
     it's getting embarrassing that it's still not done; at the next face
     to face meeting we should also reserve some time for retrospective
     sessions.

 Implementing the state machine-based interceptor stack may give us a
 performance boost, but I'm much more certain that it's a very complex,
 high risk task... and we don't have a stable test suite yet :) 
Yes - this is something major, let's add it to the agenda

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution