[infinispan-dev] XAResource.isSameRM

Thu Jan 6 14:01:29 EST 2011

On 01/06/2011 05:43 PM, Mircea Markus wrote:
>
> On 6 Jan 2011, at 14:45, Jonathan Halliday wrote:
>
>> On 01/06/2011 02:29 PM, Mircea Markus wrote:
>>
>>> At the moment the *only way to transactionally* access a
>>> node is by collocating the client and the server in the same
>>> VM.
>>
>> So the scope of the transaction is limited to data
>> residing in that local node? What if I want a single
>> transaction to span the local node and data in a remote node?
> that's possible. It's just that you have to always interact
> with the local node that will acquire remote locks remotely
> on behalf of your transaction.

ok, so the cluster intelligence is in the local node rather 
than the client, not that there is any significant 
distinction for now as they are co-located.

> a) node goes down before TM issued prepare
> - when TM resurrects and calls XAResource.recover it
> receives the given XID, realises that there's an heuristic
> decision (because it didn't call prepare) and take some
> action (rollbacks other participants, notify sys admin?).

That's not a heuristic decision. A RM is perfectly entitled 
to throw away any tx state up until prepare. Under the 
presumed abort doctrine it simply throws an error from 
prepare and the tx aborts cleanly. Recovery is not involved 
- it applies only to tx that have reached prepare.

> b) node goes down after TM issues prepare
> - when TM issues a commit it receives an XAException
> (perhaps XA_HEURRB) and again it is aware of the heuristic
> outcome

Returning cleanly from a prepare is a promise by the RM to 
successfully apply any subsequent commit. You're not in a 
position to make such a promise unless your state is fault 
tolerant, as a node crash would otherwise leave you with 
inconsistent state.

It's not as simple as saying you'd rollback - what if you 
prepare, get told to commit, apply remote changes (step 
4.2.1), then crash before applying local changes (4.2.2)? 
You can't report that as a rollback - you applied some of 
the updates. You have to include the tx in the recovery list 
as heuristic hazard (unless NodeA will transparently 
repopulate with the committed data, in which case you can 
mask the failure or report heuristic commit), but how to 
even detect the heuristic at recovery time? NodeA has no 
persistent record of the tx and NodeB thinks it completed 
cleanly and has cleaned up its tx record to avoid leaking. 
Where is the data that tells you you've got a problem?

Or have a more sophisticated scenario where there is an 
additional NodeC, thus requiring multiple 'apply remote 
changes' calls. Are those atomic across the cluster? If 
there is a possibility that NodeB will apply the update but 
NodeC won't, or NodeA will crash after issuing a call to B 
but before C, you can wind up with inconsistent state in the 
surviving B and C. Alternatively, what if A survives but C 
crashes whilst applying changes that B has already 
sucessfully applied? That's not necessarily a recovery 
situation as far as the TM is concerned, but it may be from 
your perspective as you'll need to detect and (ideally) fix 
or (as a last resort) report the inconsistent data.

A lot of your behaviour is going to depend on what it means 
for a node to recover after a crash. If it simply comes up 
empty and expects to be repopulated from an external source, 
as with a normal cache, then your relation to the XAResource 
of that external source is critical. On the other hand if 
your cluster node is itself fault tolerant through 
replication, then you need to think carefully about how the 
RM functionality ties into that replication - basically the 
tx state information is not local to the node where the 
XAResource resides, but must be replicated in the same 
manner as the other data in that node and that replication 
must be synchronous at certain state transitions in the tx 
lifecycle - it's logging recovery information through RPC 
rather than disk write. Really interesting things are going 
to happen if a single transaction spans data that is a mix 
of cache copy of data stored persistently in an XA database 
and data for which infinispan is the definitive, fault 
tolerant repository.

To make a cluster appear to the outside world as a single 
logical entity for transaction purposes, you're pretty much 
going to wind up doing interposition. That means you're 
implementing not only an RM but substantial chunks of what 
amounts to a TM too. Have fun.

Jonathan.

-- 
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 
Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 
3798903  Directors: Michael Cunningham (USA), Charlie Peters 
(USA), Matt Parsons (USA) and Brendan Lane (Ireland)