On 6 Jan 2011, at 11:48, Jonathan Halliday wrote:
On 01/06/2011 10:45 AM, Mircea Markus wrote:
> When a node crashes all the transactions that node owns (i.e. tx which were
originated on that node and XAResource instance residing on that node) automatically
rollback, so that no resources (locks mainly) are held. The only thing we need to make
sure though is that the given transaction ids (the one that heuristically rollback) are
returned by theXAResource.recover method - doable in the same way we handle prepares. I
imagine that we'll have to keep these XIDs until XAResource.forget(XID) is called, am
I right?
I was under the impression a node does not own a tx. It may own a *branch* of that tx.
Take the case where node JBossAS-1 starts a tx, propagates it to node JBossAS-2 and both
JBossAS-1 and JBossAS-2 then simultaneously contact different nodes of the infinispan
cluster in the scope of that tx. Each node would see a different branch of the same global
tx. You presumably don't want to have to sync the ownership for each new tx across the
cluster? The only way you could tie an entire tx to a single infinispan node is e.g.
consistent hashing to force the decision of which node the driver connects to based on the
tx context.
you are absolutely right, I don't want to sync ownership across the
cluster.
An XAResource does not reside in an infinispan node (although there may be something
equivalent holding tx state on the server side) - it's a client/driver side construct.
I see.
At the moment the *only way to transactionally* access a node is by collocating the client
and the server in the same VM. Hence the XAResource residing in the ISPN node's side.
ISPN 5.0 will take things further by allowing client/server access through
Hotrod:http://community.jboss.org/wiki/TransactionsOverHotRod
Given that the driver does transparent failover / load balancing and
such, the XAResource can't be said to belong to a specific infinispan node unless you
throw away some of the clustering availability advantages. It's really a question of
where you are going to put the clustering intelligence - in a smart client side driver or
in a server side component that acts as a kind of routing proxy.
At the moment if a
node that is touched by a transaction/branch crashes before transaction commits then all
the acquired resources are releases and the transaction is marked for rollback.
With your 'rollback tx branch on node crash' model you are failing to provide
ACID semantics for the cluster as a whole. You can abort them before the prepare stage,
but post prepare it's not an option as you'll piss off clients who expect the
cluster to behave correctly as long as a majority of its nodes survive. I'm not saying
it's flat out wrong, just that it needs to be very clearly documented in order to
avoid getting whined at by disgruntled users. My understanding is you're pitching
infinispan not as a volatile cache, but an in-memory data grid. In that model the node
does not own the tx, the cluster owns the tx and is responsible for masking node failures
from the client. Rollback of prepared tx on node failure is therefore not an option - some
part of the tx state may already have been committed by surviving nodes and you'll get
inconsistencies. You need to replicate enough information to avoid that, otherwise the
client app is going to have to explicitly provide logic to do the reconciliation, which
sucks.
Good point. At the moment (i.e. client and the node in same VM) this is not
an issue because if the node crashes you can safely assume that the client is crashed as
well. Something to be considered though, with remote TX over hotrod in mind.
> Is it common/possible for people to use TM _without_ recovery? If so, this
"held heuristic completed TX" functionality should be configurable
(enabled/disabled) in order to avoid memory leaks (no recovery means .forget never gets
called)
It is not common. That said, JBossTS has for similar reasons got a 'give up after N
hours' config option which will eventually abandon tx that have not recovered.
It's off (i.e. never give up) by default but a small number of users find it handy.
Most just use the admin tooling to manually clean up the small number of unrecoverable
situations - it's safer in most cases.
Right, JMX might come handy for that.
>> Another interesting issue is what constitutes an 'in-doubt' tx. Pretty
much all RMs will include heuristically completed tx in the recovery list. Some will
include tx branches that have prepared but not yet committed or rolled back. Some will
include such only if they have been in the prepared state for greater than some threshold
length of time (a few seconds i.e. a couple of order of magnitude longer than a tx would
normally be expected to hold that state). There is also the question of when a tx should
be removed from the list. The wording of the spec
>>
>> 'Two consecutive invocation of [recover] that starts from the beginning of
the list must return the same list
>> of transaction branches unless one of the following takes place:
>> - the transaction manager invokes the commit, forget, prepare, or rollback
method for that resource
>> manager, between the two consecutive invocation of the recovery scan
>> ...'
>>
>> seems to imply a single transaction manager.
> doesn't this also imply that the prepare-treshold isn't the spec's way?
I.e. even though TM doesn't call any method on the RM , the RM returns a new XID in
the result of XAResource.recover when the threshold is reached.
The definition of what constitutes an 'in-doubt' tx for purposes of inclusion in
the recovery list is not well defined by the spec.
In a world where there is only one TM driving the RM and that TM performs recovery before
starting up and running new tx, no new tx will be added to the recovery list. In the real
world new ones are being added continuously as the system is always under load.
The concern is more around the stability of the list with respect to not *removing*
things except in response to TM activity. i.e. if a node goes away, should you return a
cached snapshot of its tx for list stability, or exclude them? both options carry risks.
Thanks for the clarification.
Jonathan.
--
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor,
Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael
Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)