[infinispan-dev] heuristic transactions & failure recovery

Mon Apr 6 10:12:18 EDT 2009

Manik Surtani wrote:
>
> On 4 Apr 2009, at 16:16, Mircea Markus wrote:
>
>> Hi,
>>
>> Current implementation of tx in JBC/infinispan might result in heuristic 
>> transactions: e.g. if the coordinator cannot send an commit message 
>> (2nd phase from 2PC) within a given timeout to some of the 
>> participants, this might results in data being committed on some 
>> nodes and rollbacked on other.
>
> ?  If the coord (and I assume you mean the transaction coordinator, 
> not the JGroups channel coordinator) doesn't broadcast a commit, none 
> of the other nodes would have committed this state.  I don't see how 
> you have a situation where it is committed on some and rolled back on 
> others.
>
> Perhaps you mean if the tx coordinator has broadcast a commit, some 
> receive the commit and before all receive the commit the tx 
> coordinator dies.
yes, this is the scenario I had in mind.
>  And you are not using multicast (if you are they all receive the 
> commit message at the same time).  But we recommend you use multicast 
> anyway so I'm not so sure if this is such a problem.
Generally speaking not all messages are received *at the same time*. 
JGrous only guarantees that they will be received.
Let's say that we have 3 nodes, A B and C. A starts tx, does a put 
("k","v") then commits tx. During commit following happen:
1) prepare is broadcasted
    B prepares and holds locks
    C prepares and holds locks
2) A sees B and C voted okay,so triggers a commit:
  - B receives the commit msg and applies changes (for good!)
  - A does not manage to send the message to C *in the given timeout*. 
At this point, the RPC call returns and A rollbacks, also C will 
rollback after a while (tx timeout). But B will have the changes 
applied, and this will result in an atomicity being violated.    
>
>> Even worse, there is no way to take action and recover from the 
>> failure. Would it make sense to have tx failure recovery  mechanism 
>> in  infinispan?
>
> Well, it depends.  If it is used as a cache for a db, then "recovery" 
> is to just empty the cache.  Otherwise, if you want to treat it as a 
> distributed in-memory db, "recovery" here would mean emptying the 
> cache instance in question, and doing a state transfer from a 
> neighbour (REPL) or re-hashing keys (DIST).
>
Yes. But right now, if a situation like the one I described happens no 
admin will be notified, and inconsistent resources will be exposed to 
users. I'm thinking about a recovery mechanism in which (continuing 
previous example).
- C to keep locks on resources and not allow users to see them until it 
can take a decision
- when communication between A and C is established, A to inform C that 
it should rollback the tx
(Of course this is a simplistic solution, the problem is more complex, 
e.g. A might die in between).

>>  I'm referring  here to something similar to the way DBs work, i.e. 
>> based on an persistent tx logs, external notifications etc? Even 
>> though I didn't see any such request on forums, I guess such a 
>> feature is mandatory for certain systems, e.g. a financial 
>> application. Wdyt?
>
> Persistent tx logs can be just as error-prone, unless you checkpoint 
> open files to disk via OS system calls to ensure all kernel and 
> hardware caches are flushed.  But this is *very* slow. 
Agreed. But it assures correctness for ones that need it.
>
> AFAIK the way DBs do this - including Oracle - is to checkpoint at 
> intervals, but this still allows for windows where your persistent tx 
> log could be out of date or corrupt.
Not sure about that - the logs can be used, in the case of heuristic tx, 
for moving the system to an consistent state.
> Cheers
> --
> Manik Surtani
> Lead, JBoss Cache
> http://www.jbosscache.org
> manik at jboss.org <mailto:manik at jboss.org>
>
>
>
>