[infinispan-dev] heuristic transactions & failure recovery
Mircea Markus
mircea.markus at jboss.com
Mon Apr 6 10:12:18 EDT 2009
Manik Surtani wrote:
>
> On 4 Apr 2009, at 16:16, Mircea Markus wrote:
>
>> Hi,
>>
>> Current implementation of tx in JBC/infinispan might result in heuristic
>> transactions: e.g. if the coordinator cannot send an commit message
>> (2nd phase from 2PC) within a given timeout to some of the
>> participants, this might results in data being committed on some
>> nodes and rollbacked on other.
>
> ? If the coord (and I assume you mean the transaction coordinator,
> not the JGroups channel coordinator) doesn't broadcast a commit, none
> of the other nodes would have committed this state. I don't see how
> you have a situation where it is committed on some and rolled back on
> others.
>
> Perhaps you mean if the tx coordinator has broadcast a commit, some
> receive the commit and before all receive the commit the tx
> coordinator dies.
yes, this is the scenario I had in mind.
> And you are not using multicast (if you are they all receive the
> commit message at the same time). But we recommend you use multicast
> anyway so I'm not so sure if this is such a problem.
Generally speaking not all messages are received *at the same time*.
JGrous only guarantees that they will be received.
Let's say that we have 3 nodes, A B and C. A starts tx, does a put
("k","v") then commits tx. During commit following happen:
1) prepare is broadcasted
B prepares and holds locks
C prepares and holds locks
2) A sees B and C voted okay,so triggers a commit:
- B receives the commit msg and applies changes (for good!)
- A does not manage to send the message to C *in the given timeout*.
At this point, the RPC call returns and A rollbacks, also C will
rollback after a while (tx timeout). But B will have the changes
applied, and this will result in an atomicity being violated.
>
>> Even worse, there is no way to take action and recover from the
>> failure. Would it make sense to have tx failure recovery mechanism
>> in infinispan?
>
> Well, it depends. If it is used as a cache for a db, then "recovery"
> is to just empty the cache. Otherwise, if you want to treat it as a
> distributed in-memory db, "recovery" here would mean emptying the
> cache instance in question, and doing a state transfer from a
> neighbour (REPL) or re-hashing keys (DIST).
>
Yes. But right now, if a situation like the one I described happens no
admin will be notified, and inconsistent resources will be exposed to
users. I'm thinking about a recovery mechanism in which (continuing
previous example).
- C to keep locks on resources and not allow users to see them until it
can take a decision
- when communication between A and C is established, A to inform C that
it should rollback the tx
(Of course this is a simplistic solution, the problem is more complex,
e.g. A might die in between).
>> I'm referring here to something similar to the way DBs work, i.e.
>> based on an persistent tx logs, external notifications etc? Even
>> though I didn't see any such request on forums, I guess such a
>> feature is mandatory for certain systems, e.g. a financial
>> application. Wdyt?
>
> Persistent tx logs can be just as error-prone, unless you checkpoint
> open files to disk via OS system calls to ensure all kernel and
> hardware caches are flushed. But this is *very* slow.
Agreed. But it assures correctness for ones that need it.
>
> AFAIK the way DBs do this - including Oracle - is to checkpoint at
> intervals, but this still allows for windows where your persistent tx
> log could be out of date or corrupt.
Not sure about that - the logs can be used, in the case of heuristic tx,
for moving the system to an consistent state.
> Cheers
> --
> Manik Surtani
> Lead, JBoss Cache
> http://www.jbosscache.org
> manik at jboss.org <mailto:manik at jboss.org>
>
>
>
>
More information about the infinispan-dev
mailing list