[infinispan-dev] XAResource.isSameRM
Jonathan Halliday
jonathan.halliday at redhat.com
Thu Jan 6 14:01:29 EST 2011
On 01/06/2011 05:43 PM, Mircea Markus wrote:
>
> On 6 Jan 2011, at 14:45, Jonathan Halliday wrote:
>
>> On 01/06/2011 02:29 PM, Mircea Markus wrote:
>>
>>> At the moment the *only way to transactionally* access a
>>> node is by collocating the client and the server in the same
>>> VM.
>>
>> So the scope of the transaction is limited to data
>> residing in that local node? What if I want a single
>> transaction to span the local node and data in a remote node?
> that's possible. It's just that you have to always interact
> with the local node that will acquire remote locks remotely
> on behalf of your transaction.
ok, so the cluster intelligence is in the local node rather
than the client, not that there is any significant
distinction for now as they are co-located.
> a) node goes down before TM issued prepare
> - when TM resurrects and calls XAResource.recover it
> receives the given XID, realises that there's an heuristic
> decision (because it didn't call prepare) and take some
> action (rollbacks other participants, notify sys admin?).
That's not a heuristic decision. A RM is perfectly entitled
to throw away any tx state up until prepare. Under the
presumed abort doctrine it simply throws an error from
prepare and the tx aborts cleanly. Recovery is not involved
- it applies only to tx that have reached prepare.
> b) node goes down after TM issues prepare
> - when TM issues a commit it receives an XAException
> (perhaps XA_HEURRB) and again it is aware of the heuristic
> outcome
Returning cleanly from a prepare is a promise by the RM to
successfully apply any subsequent commit. You're not in a
position to make such a promise unless your state is fault
tolerant, as a node crash would otherwise leave you with
inconsistent state.
It's not as simple as saying you'd rollback - what if you
prepare, get told to commit, apply remote changes (step
4.2.1), then crash before applying local changes (4.2.2)?
You can't report that as a rollback - you applied some of
the updates. You have to include the tx in the recovery list
as heuristic hazard (unless NodeA will transparently
repopulate with the committed data, in which case you can
mask the failure or report heuristic commit), but how to
even detect the heuristic at recovery time? NodeA has no
persistent record of the tx and NodeB thinks it completed
cleanly and has cleaned up its tx record to avoid leaking.
Where is the data that tells you you've got a problem?
Or have a more sophisticated scenario where there is an
additional NodeC, thus requiring multiple 'apply remote
changes' calls. Are those atomic across the cluster? If
there is a possibility that NodeB will apply the update but
NodeC won't, or NodeA will crash after issuing a call to B
but before C, you can wind up with inconsistent state in the
surviving B and C. Alternatively, what if A survives but C
crashes whilst applying changes that B has already
sucessfully applied? That's not necessarily a recovery
situation as far as the TM is concerned, but it may be from
your perspective as you'll need to detect and (ideally) fix
or (as a last resort) report the inconsistent data.
A lot of your behaviour is going to depend on what it means
for a node to recover after a crash. If it simply comes up
empty and expects to be repopulated from an external source,
as with a normal cache, then your relation to the XAResource
of that external source is critical. On the other hand if
your cluster node is itself fault tolerant through
replication, then you need to think carefully about how the
RM functionality ties into that replication - basically the
tx state information is not local to the node where the
XAResource resides, but must be replicated in the same
manner as the other data in that node and that replication
must be synchronous at certain state transitions in the tx
lifecycle - it's logging recovery information through RPC
rather than disk write. Really interesting things are going
to happen if a single transaction spans data that is a mix
of cache copy of data stored persistently in an XA database
and data for which infinispan is the definitive, fault
tolerant repository.
To make a cluster appear to the outside world as a single
logical entity for transaction purposes, you're pretty much
going to wind up doing interposition. That means you're
implementing not only an RM but substantial chunks of what
amounts to a TM too. Have fun.
Jonathan.
--
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111
Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No.
3798903 Directors: Michael Cunningham (USA), Charlie Peters
(USA), Matt Parsons (USA) and Brendan Lane (Ireland)
More information about the infinispan-dev
mailing list