[infinispan-dev] XAResource.isSameRM
Jonathan Halliday
jonathan.halliday at redhat.com
Thu Jan 6 06:48:08 EST 2011
On 01/06/2011 10:45 AM, Mircea Markus wrote:
> When a node crashes all the transactions that node owns (i.e. tx which were originated on that node and XAResource instance residing on that node) automatically rollback, so that no resources (locks mainly) are held. The only thing we need to make sure though is that the given transaction ids (the one that heuristically rollback) are returned by theXAResource.recover method - doable in the same way we handle prepares. I imagine that we'll have to keep these XIDs until XAResource.forget(XID) is called, am I right?
I was under the impression a node does not own a tx. It may
own a *branch* of that tx. Take the case where node
JBossAS-1 starts a tx, propagates it to node JBossAS-2 and
both JBossAS-1 and JBossAS-2 then simultaneously contact
different nodes of the infinispan cluster in the scope of
that tx. Each node would see a different branch of the same
global tx. You presumably don't want to have to sync the
ownership for each new tx across the cluster? The only way
you could tie an entire tx to a single infinispan node is
e.g. consistent hashing to force the decision of which node
the driver connects to based on the tx context.
An XAResource does not reside in an infinispan node
(although there may be something equivalent holding tx state
on the server side) - it's a client/driver side construct.
Given that the driver does transparent failover / load
balancing and such, the XAResource can't be said to belong
to a specific infinispan node unless you throw away some of
the clustering availability advantages. It's really a
question of where you are going to put the clustering
intelligence - in a smart client side driver or in a server
side component that acts as a kind of routing proxy.
With your 'rollback tx branch on node crash' model you are
failing to provide ACID semantics for the cluster as a
whole. You can abort them before the prepare stage, but post
prepare it's not an option as you'll piss off clients who
expect the cluster to behave correctly as long as a majority
of its nodes survive. I'm not saying it's flat out wrong,
just that it needs to be very clearly documented in order to
avoid getting whined at by disgruntled users. My
understanding is you're pitching infinispan not as a
volatile cache, but an in-memory data grid. In that model
the node does not own the tx, the cluster owns the tx and is
responsible for masking node failures from the client.
Rollback of prepared tx on node failure is therefore not an
option - some part of the tx state may already have been
committed by surviving nodes and you'll get inconsistencies.
You need to replicate enough information to avoid that,
otherwise the client app is going to have to explicitly
provide logic to do the reconciliation, which sucks.
> Is it common/possible for people to use TM _without_ recovery? If so, this "held heuristic completed TX" functionality should be configurable (enabled/disabled) in order to avoid memory leaks (no recovery means .forget never gets called)
It is not common. That said, JBossTS has for similar reasons
got a 'give up after N hours' config option which will
eventually abandon tx that have not recovered. It's off
(i.e. never give up) by default but a small number of users
find it handy. Most just use the admin tooling to manually
clean up the small number of unrecoverable situations - it's
safer in most cases.
>> Another interesting issue is what constitutes an 'in-doubt' tx. Pretty much all RMs will include heuristically completed tx in the recovery list. Some will include tx branches that have prepared but not yet committed or rolled back. Some will include such only if they have been in the prepared state for greater than some threshold length of time (a few seconds i.e. a couple of order of magnitude longer than a tx would normally be expected to hold that state). There is also the question of when a tx should be removed from the list. The wording of the spec
>>
>> 'Two consecutive invocation of [recover] that starts from the beginning of the list must return the same list
>> of transaction branches unless one of the following takes place:
>> - the transaction manager invokes the commit, forget, prepare, or rollback method for that resource
>> manager, between the two consecutive invocation of the recovery scan
>> ...'
>>
>> seems to imply a single transaction manager.
> doesn't this also imply that the prepare-treshold isn't the spec's way? I.e. even though TM doesn't call any method on the RM , the RM returns a new XID in the result of XAResource.recover when the threshold is reached.
The definition of what constitutes an 'in-doubt' tx for
purposes of inclusion in the recovery list is not well
defined by the spec.
In a world where there is only one TM driving the RM and
that TM performs recovery before starting up and running new
tx, no new tx will be added to the recovery list. In the
real world new ones are being added continuously as the
system is always under load.
The concern is more around the stability of the list with
respect to not *removing* things except in response to TM
activity. i.e. if a node goes away, should you return a
cached snapshot of its tx for list stability, or exclude
them? both options carry risks.
Jonathan.
--
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111
Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No.
3798903 Directors: Michael Cunningham (USA), Charlie Peters
(USA), Matt Parsons (USA) and Brendan Lane (Ireland)
More information about the infinispan-dev
mailing list