On Fri, Nov 27, 2015 at 1:45 PM, Radim Vansa <rvansa(a)redhat.com> wrote:
On 11/27/2015 11:48 AM, Bela Ban wrote:
> You're talking about the case where P applies the PUT, and sends an ACK
> back to O, but the async updates to the Bs are received by only a subset
> (or none) of the Bs, and then P crashes.
>
> As I was referring about the non-transactional case, wouldn't this be
> fine? Or do we want the *non-transactional* case to be an atomic update
> of P and all Bs? IMO, the latter should be done as part of a TX, not for
> the non-transactional case.
You're not talking about non-transactional mode but mongo mode :)
Indeed! There's a reason we don't recommend using DIST_ASYNC, which is
quite similar. In fact, I wouldn't be opposed to changing DIST_ASYNC
so that the O -> P communication is synchronous, and only the P -> B
communication is asynchronous.
Non-transactional mode still guarantees that the data will be reliably
stored, but it does not allow any consistency between two keys.
Transactional mode allows you to change all keys or none of them.
The atomicity is rather discutable. All writes are atomic with respect
to writes, but reads just come and read something, and there's no way to
make sure that two transactional reads read the same value.
Due to the two-armies problem, in case that an error is encountered,
it's possible that the cluster will end up in inconsistent state - in
non-tx mode this is the updated B and P not applying the update. In
transactional case, if the second phase (CommitCommand) gets executed on
a subset of nodes and the others don't reply, the rollback sent cannot
undo the already committed transactions. In that case, Infinispan is
obliged to throw an exception to the user (tx mode requires
useSynchronizations=false to do this) but it cannot prevent/resolve it.
Right, the biggest source of inconsistencies right now is replication
timeouts, because those are not retried.
As Radim says, transactional mode is not necessarily safer when
timeouts are involved. I've argued before that we should remove the
replication timeout and rely on FD exclusively, because suspected
nodes are handled much better.
There are also problems when the originator crashes after sending the
commit command to some owners, but before all of them got it [1]. With
pessimistic locking, the same can happen if the originator crashes
after only some of the owners received the prepare command. We've made
some progress handling crashed originators when partition handling is
enabled, but we aren't covering all the cases yet.
Still, there's a difference between having an inconsistency after
reporting an error to the application, and having an inconsistency
after reporting that everything's A-OK.
[1]
https://issues.jboss.org/browse/ISPN-3421
>
> So I think we need to come up with a concise definition of what the
> transactional versus non-transaction semantics are.
>
> But even if we go with a design where O waits for ACKs from *all* Bs, we
> can still end up with inconsistencies; e.g. when not all Bs received the
> updates. O will fail the PUT, but the question is what do we do in such
> a case? Re-submit the PUT?
Throw exception and provide API to asses the situation.
The "API to assess the situation" part is kind of lacking at the
moment, but yeah, that's the idea. I doubt we'll ever get to a point
where we can say the cache will *never* become inconsistent, but the
application should at least be notified when something goes wrong.
Cheers
Dan