[infinispan-dev] The "Triangle" pattern for reducing Put latency

Radim Vansa rvansa at redhat.com
Wed Nov 25 10:15:14 EST 2015


On 11/25/2015 03:24 PM, Pedro Ruivo wrote:
>
> On 11/25/2015 01:20 PM, Radim Vansa wrote:
>> On 11/25/2015 12:07 PM, Sanne Grinovero wrote:
>>> On 25 November 2015 at 10:48, Pedro Ruivo <pedro at infinispan.org> wrote:
>>>>> An alternative is to wait for all ACKs, but I think this could still
>>>>> be optimised in "triangle shape" too by having the Originator only
>>>>> wait for the ACKs from the non-primary replicas?
>>>>> So backup owners have to send a confirmation message to the
>>>>> Originator, while the Primary owner isn't expecting to do so.
>>>> IMO, we should wait for all ACKs to keep our read design.
>> What exactly is our 'read design'?
> If we don't wait for all the ACKs, then we have to go to the primary
> owner for reads, even if the originator is a Backup owner.

I don't think so, but we probably have som miscom. If O = B, we still 
wait for reply from B (which is local) which is triggered by receiving 
an update from P (after applying the change locally). So it goes

OB(application thread) [cache.put()] -(unordered)-> P(worker thread) 
[applies update] -(ordered)-> OB(worker thread) [applies update] 
-(in-VM)-> OB(application thread) [continues]

>
>> I think that the source of optimization is that once primary decides to
>> backup the operation, he can forget about it and unlock the entry. So,
>> we don't need any ACK from primary unless it's an exception/noop
>> notification (as with conditional ops). If primary waited for ACK from
>> backup, we wouldn't save anything.
> About the iteration between P -> B, you're right. We don't need to wait
> for the ACKs if the messages are sent in FIFO (and JGroups guarantee that)
>
> About the O -> P, IMO, the Originator should wait for the reply from
> Backup.

I was never claiming otherwise, O always needs to wait for ACK from Bs - 
only then it can successfully report that value has been written on all 
owners. What does this have to do with O -> P?

> At least, the Primary would be the only one who needs to return
> the previous value (if needed) and it can return if the operation
> succeed or not.

Simple success: no P -> O, B -> O (success)
Simple failure/non-modifying operation (as with putIfAbsent/functional 
call): P -> O (failure/custom value), no B -> O
previous/custom value (as with replace() or functional call): P -> O 
(previous/custom value), B -> O (success); alternative is P -> B 
(previous/custom value, new value) and B -> O (previous/custom value)
Exception on either P or B: send the exception to O
Lost/timed out P -> B: O times out waiting for ack from B, throws exception


> This way, it would avoid forking the code for each type
> of command without any benefit (I'm thinking sending the reply to
> originator in parallel with the update message to the backups).

What forking of code for each type do you mean? I see that there are two 
branches whether the command is going to be replicated to B or not.

Radim

>
>> The gains are:
>> * less hops (3 instead of 4 if O != P && O != B)
>> * less messages (primary ACK is transitive based on ack from B)
>> * shorter lock times (not locking during P -> B RPC)
>>
>>>> However, the
>>>> Originator needs to wait for the ACK from Primary because of conditional
>>>> operations and functional API.
>>> If the operation is successful, Primary will have to let the
>>> secondaries know so these can reply to the Originator directly: still
>>> saves an hop.
> As I said above: "I'm thinking sending the reply to originator in
> parallel with the update message to the backups"
>
>>>> In this first case, if the conditional operation fail, the Backups are
>>>> not bothered. The latter case, we may need the return value from the
>>>> function.
>>> Right, for a failed or rejected operation the secondaries won't even
>>> know about it,
>>> so the Primary is in charge of letting the Originator know.
>>> Essentially you're highlighting that the Originator needs to wait for
>>> either the response from secondaries (all of them?)
>>> or from the Primary.
>>>
>>>>> I suspect the tricky part is what happens when the Primary owner rules
>>>>> +1 to apply the change, but then the backup owners (all or some of
>>>>> them) somehow fail before letting the Originator know. The Originator
>>>>> in this case should seek confirmation about its operation state
>>>>> (success?) with the Primary owner; this implies that the Primary owner
>>>>> needs to keep track of what it's applied and track failures too, and
>>>>> this log needs to be pruned.
>> Currently, in case of lost (timed out) ACK from B to P, we just report
>> exception and don't care about synchronizing P and B - B can already
>> store updated value.
>> So we don't have to care about rollback on P if replication to B fails
>> either - we just report that it's broken, sorry.
>> Better consolidation API would be nice, though, something like
>> cache.getAllVersions().
>>
>> Radim
>>
>>
>>>>> Sounds pretty nice, or am I missing other difficulties?
>>>>>
>>>>> Thanks,
>>>>> Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss Performance Team



More information about the infinispan-dev mailing list