On 6 Apr 2009, at 15:12, Mircea Markus wrote:
Manik Surtani wrote:
>
> On 4 Apr 2009, at 16:16, Mircea Markus wrote:
>
>> Hi,
>>
>> Current implementation of tx in JBC/infinispan might result in
>> heuristic transactions: e.g. if the coordinator cannot send an
>> commit message (2nd phase from 2PC) within a given timeout to some
>> of the participants, this might results in data being committed on
>> some nodes and rollbacked on other.
>
> ? If the coord (and I assume you mean the transaction coordinator,
> not the JGroups channel coordinator) doesn't broadcast a commit,
> none of the other nodes would have committed this state. I don't
> see how you have a situation where it is committed on some and
> rolled back on others.
>
> Perhaps you mean if the tx coordinator has broadcast a commit, some
> receive the commit and before all receive the commit the tx
> coordinator dies.
yes, this is the scenario I had in mind.
> And you are not using multicast (if you are they all receive the
> commit message at the same time). But we recommend you use
> multicast anyway so I'm not so sure if this is such a problem.
Generally speaking not all messages are received *at the same time*.
JGrous only guarantees that they will be received.
Let's say that we have 3 nodes, A B and C. A starts tx, does a put
("k","v") then commits tx. During commit following happen:
1) prepare is broadcasted
B prepares and holds locks
C prepares and holds locks
2) A sees B and C voted okay,so triggers a commit:
- B receives the commit msg and applies changes (for good!)
- A does not manage to send the message to C *in the given timeout*.
At this point, the RPC call returns and A rollbacks, also C will
rollback after a while (tx timeout). But B will have the changes
applied, and this will result in an atomicity being violated.
Yes, but this is allowed in 2PC. This leaves the tx in a state of
STATUS_UNKNOWN, and it is up to the transaction manager to initiate a
recovery *if* the resources are XA compliant and support recovery.
>> Even worse, there is no way to take action and recover from
the
>> failure. Would it make sense to have tx failure recovery
>> mechanism in infinispan?
>
> Well, it depends. If it is used as a cache for a db, then
> "recovery" is to just empty the cache. Otherwise, if you want to
> treat it as a distributed in-memory db, "recovery" here would mean
> emptying the cache instance in question, and doing a state transfer
> from a neighbour (REPL) or re-hashing keys (DIST).
>
Yes. But right now, if a situation like the one I described happens
no admin will be notified, and inconsistent resources will be
exposed to users. I'm thinking about a recovery mechanism in which
(continuing previous example).
- C to keep locks on resources and not allow users to see them until
it can take a decision
- when communication between A and C is established, A to inform C
that it should rollback the tx
(Of course this is a simplistic solution, the problem is more
complex, e.g. A might die in between).
>> I'm referring here to something similar to the way DBs work, i.e.
>> based on an persistent tx logs, external notifications etc? Even
>> though I didn't see any such request on forums, I guess such a
>> feature is mandatory for certain systems, e.g. a financial
>> application. Wdyt?
>
> Persistent tx logs can be just as error-prone, unless you
> checkpoint open files to disk via OS system calls to ensure all
> kernel and hardware caches are flushed. But this is *very* slow.
Agreed. But it assures correctness for ones that need it.
>
> AFAIK the way DBs do this - including Oracle - is to checkpoint at
> intervals, but this still allows for windows where your persistent
> tx log could be out of date or corrupt.
Not sure about that - the logs can be used, in the case of heuristic
tx, for moving the system to an consistent state.
Provided there were no system failures between the collecting and the
storing of these logs. :-)
--
Manik Surtani
manik(a)jboss.org
Lead, JBoss Cache
http://www.jbosscache.org