On 11/27/2015 11:48 AM, Bela Ban wrote:
You're talking about the case where P applies the PUT, and sends
an ACK
back to O, but the async updates to the Bs are received by only a subset
(or none) of the Bs, and then P crashes.
As I was referring about the non-transactional case, wouldn't this be
fine? Or do we want the *non-transactional* case to be an atomic update
of P and all Bs? IMO, the latter should be done as part of a TX, not for
the non-transactional case.
You're not talking about non-transactional mode but mongo mode :)
Non-transactional mode still guarantees that the data will be reliably
stored, but it does not allow any consistency between two keys.
Transactional mode allows you to change all keys or none of them.
The atomicity is rather discutable. All writes are atomic with respect
to writes, but reads just come and read something, and there's no way to
make sure that two transactional reads read the same value.
Due to the two-armies problem, in case that an error is encountered,
it's possible that the cluster will end up in inconsistent state - in
non-tx mode this is the updated B and P not applying the update. In
transactional case, if the second phase (CommitCommand) gets executed on
a subset of nodes and the others don't reply, the rollback sent cannot
undo the already committed transactions. In that case, Infinispan is
obliged to throw an exception to the user (tx mode requires
useSynchronizations=false to do this) but it cannot prevent/resolve it.
So I think we need to come up with a concise definition of what the
transactional versus non-transaction semantics are.
But even if we go with a design where O waits for ACKs from *all* Bs, we
can still end up with inconsistencies; e.g. when not all Bs received the
updates. O will fail the PUT, but the question is what do we do in such
a case? Re-submit the PUT?
Throw exception and provide API to asses the situation.
Radim
On 27/11/15 11:12, Radim Vansa wrote:
> The update needs to be applied to *all* owners before the call returns
> on O. With your strategy, P could apply update, send ACK but the async
> backup updates would not be delivered on Bs; so an ACKed update would
> get completely lost.
> I don't say that these async Bs are not possible, but not in the basic
> case - for default configuration, we need to keep the guarantees.
>
> Radim
>
> On 11/27/2015 10:34 AM, Bela Ban wrote:
>> Adding to what Radim wrote (below), would the following make sense
>> (conditions: non-TX, P != O && O != B)?
>>
>> The lock we acquire on P is actually used to establish an ordering for
>> updates to the Bs. So this is very similar to SEQUENCER, expect that we
>> have a sequencer (P) *per key*.
>>
>> Phase 1
>> -------
>> - O sends a PUT(x) message to P
>>
>> Phase 2
>> -------
>> - P adds PUT(x) to a queue and returns (freeing the up-thread)
>> - A thread dequeues PUT(x) and sends an (async) UPDATE message to all Bs
>> (possible optimization: send updates to the same key sets as batches)
>> - PUT(x) is applied locally and an ACK is sent back to O
>>
>> O times out and throws an exception if it doesn't receive the ack from P.
>>
>> This would reduce the current 4 phases (for the above conditions) to 2,
>> plus the added latency of processing PUT(x) in the queue. However, we'd
>> get rid of the put-while-holding-the-lock issue.
>>
>> P's updates to the Bs are FIFO ordered, therefore all we need to do is
>> send the update down into UNICAST3 (or NAKACK2, if we use multicasts)
>> which guarantees ordering. Subsequent updates are ordered according to
>> send order. The updates are guaranteed to be retransmitted as long as P
>> is alive.
>>
>> If P crashes before returning the ack to O, or while updating the Bs,
>> then O will time out and throw an exception. And, yes, there can be
>> inconsistencies, but we're talking about the non-TX case. Perhaps O
>> could resubmit PUT(x) to the new P.
>>
>> I don't know how this behaves wrt rebalancing: are we flushing pending
>> updates before installing the new CH?
>>
>> Thoughts?
>>
>>
>>> I think that the source of optimization is that once primary decides to
>>> backup the operation, he can forget about it and unlock the entry. So,
>>> we don't need any ACK from primary unless it's an exception/noop
>>> notification (as with conditional ops). If primary waited for ACK from
>>> backup, we wouldn't save anything.
>
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team