[infinispan-dev] DIST-SYNC, put(), a problem and a solution

Wed Jul 30 06:34:28 EDT 2014

On 29/07/14 23:35, Sanne Grinovero wrote:

> The strategy I've proposed is only to be applied for the communication
> from the primary owner to its backups:
> the value to be written is well known as it's the primary owner which
> defines it unilaterally (for example if there is an atomic replacement
> to be computed)
> and there is no need for extra RPCs as the sequence is not related to
> a group of changes but for the specific entry only.

How would this work with TXs involving multiple keys on different 
primary owners ? Each owner replicates with seqnos to the backup owners, 
so changes for single keys are received in order, but do we (need to) 
guarantee that TXs consistency is preserved ? In other words, do we 
preserve isolation: all changes of a TX are observed at the same logical 
time, across multiple backup owners ?

> There is no such thing as a need for consensus across owners, nor need
> for a central source for sequences.
>
> Also I don't see it as an alternative to TOA, I rather expect it to
> work nicely together: when TOA is enabled you could trust the
> originating sequence source rather than generate a per-entry sequence,
> and in neither case you need to actually use a Lock.
> I haven't thought how the sequences would need to interact (if they
> need), but they seem complementary to resolve different aspects, and
> also both benefit from the same cleanup and basic structure.
>
>>> Another aspect is that the "user thread" on the primary owner needs to
>>> wait (at least until we improve further) and only proceed after ACK
>>> from backup nodes, but this is better modelled through a state
>>> machine. (Also discussed in Farnborough).
>>
>>
>> To be clear, I don't think keeping the user thread on the originator blocked
>> until we have the write confirmations from all the backups is a problem - a
>> sync operation has to block, and it also serves to rate-limit user
>> operations.
>
>
> There are better ways to rate-limit than to make all operations slow;
> we don't need to block a thread, we need to react on the reply from
> the backup owners.

Agreed. I think Dan mentioned it as a side effect.

> You still have an inherent rate-limit in the outgoing packet queues:
> if these fill up, then and only then it's nice to introduce some back
> pressure.
>
>
>> The problem appears when the originator is not the primary owner, and the
>> thread blocking for backup ACKs is from the remote-executor pool (or OOB,
>> when the remote-executor pool is exhausted).
>
> Not following. I guess this is out of scope now that I clarified the
> proposed solution is only to be applied between primary and backups?
>
>
>>>
>>> It's also conceptually linked to:
>>>   - https://issues.jboss.org/browse/ISPN-1599
>>> As you need to separate the locks of entries from the effective user
>>> facing lock, at least to implement transactions on top of this model.
>>
>>
>> I think we fixed ISPN-1599 when we changed passivation to use
>> DataContainer.compute(). WDYT Pedro, is there anything else you'd like to do
>> in the scope of ISPN-1599?
>>
>>>
>>> I expect this to improve performance in a very significant way, but
>>> it's getting embarrassing that it's still not done; at the next face
>>> to face meeting we should also reserve some time for retrospective
>>> sessions.
>>
>>
>> Implementing the state machine-based interceptor stack may give us a
>> performance boost, but I'm much more certain that it's a very complex, high
>> risk task... and we don't have a stable test suite yet :)
>
> Cleaning up and removing some complexity such as
> TooManyExecutorsException might help to get it stable, and keep it
> there :)
> BTW it was quite stable for me until you changed the JGroups UDP
> default configuration.
>
> Sanne

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)