[infinispan-dev] XAResource.isSameRM
Mircea Markus
mircea.markus at jboss.com
Thu Jan 6 09:29:44 EST 2011
On 6 Jan 2011, at 11:48, Jonathan Halliday wrote:
> On 01/06/2011 10:45 AM, Mircea Markus wrote:
>
>> When a node crashes all the transactions that node owns (i.e. tx which were originated on that node and XAResource instance residing on that node) automatically rollback, so that no resources (locks mainly) are held. The only thing we need to make sure though is that the given transaction ids (the one that heuristically rollback) are returned by theXAResource.recover method - doable in the same way we handle prepares. I imagine that we'll have to keep these XIDs until XAResource.forget(XID) is called, am I right?
>
> I was under the impression a node does not own a tx. It may own a *branch* of that tx.
> Take the case where node JBossAS-1 starts a tx, propagates it to node JBossAS-2 and both JBossAS-1 and JBossAS-2 then simultaneously contact different nodes of the infinispan cluster in the scope of that tx. Each node would see a different branch of the same global tx. You presumably don't want to have to sync the ownership for each new tx across the cluster? The only way you could tie an entire tx to a single infinispan node is e.g. consistent hashing to force the decision of which node the driver connects to based on the tx context.
you are absolutely right, I don't want to sync ownership across the cluster.
>
> An XAResource does not reside in an infinispan node (although there may be something equivalent holding tx state on the server side) - it's a client/driver side construct.
I see.
At the moment the *only way to transactionally* access a node is by collocating the client and the server in the same VM. Hence the XAResource residing in the ISPN node's side. ISPN 5.0 will take things further by allowing client/server access through Hotrod:http://community.jboss.org/wiki/TransactionsOverHotRod
> Given that the driver does transparent failover / load balancing and such, the XAResource can't be said to belong to a specific infinispan node unless you throw away some of the clustering availability advantages. It's really a question of where you are going to put the clustering intelligence - in a smart client side driver or in a server side component that acts as a kind of routing proxy.
At the moment if a node that is touched by a transaction/branch crashes before transaction commits then all the acquired resources are releases and the transaction is marked for rollback.
>
> With your 'rollback tx branch on node crash' model you are failing to provide ACID semantics for the cluster as a whole. You can abort them before the prepare stage, but post prepare it's not an option as you'll piss off clients who expect the cluster to behave correctly as long as a majority of its nodes survive. I'm not saying it's flat out wrong, just that it needs to be very clearly documented in order to avoid getting whined at by disgruntled users. My understanding is you're pitching infinispan not as a volatile cache, but an in-memory data grid. In that model the node does not own the tx, the cluster owns the tx and is responsible for masking node failures from the client. Rollback of prepared tx on node failure is therefore not an option - some part of the tx state may already have been committed by surviving nodes and you'll get inconsistencies. You need to replicate enough information to avoid that, otherwise the client app is going to have to explicitly provide logic to do the reconciliation, which sucks.
Good point. At the moment (i.e. client and the node in same VM) this is not an issue because if the node crashes you can safely assume that the client is crashed as well. Something to be considered though, with remote TX over hotrod in mind.
>
>> Is it common/possible for people to use TM _without_ recovery? If so, this "held heuristic completed TX" functionality should be configurable (enabled/disabled) in order to avoid memory leaks (no recovery means .forget never gets called)
>
> It is not common. That said, JBossTS has for similar reasons got a 'give up after N hours' config option which will eventually abandon tx that have not recovered. It's off (i.e. never give up) by default but a small number of users find it handy. Most just use the admin tooling to manually clean up the small number of unrecoverable situations - it's safer in most cases.
Right, JMX might come handy for that.
>
>>> Another interesting issue is what constitutes an 'in-doubt' tx. Pretty much all RMs will include heuristically completed tx in the recovery list. Some will include tx branches that have prepared but not yet committed or rolled back. Some will include such only if they have been in the prepared state for greater than some threshold length of time (a few seconds i.e. a couple of order of magnitude longer than a tx would normally be expected to hold that state). There is also the question of when a tx should be removed from the list. The wording of the spec
>>>
>>> 'Two consecutive invocation of [recover] that starts from the beginning of the list must return the same list
>>> of transaction branches unless one of the following takes place:
>>> - the transaction manager invokes the commit, forget, prepare, or rollback method for that resource
>>> manager, between the two consecutive invocation of the recovery scan
>>> ...'
>>>
>>> seems to imply a single transaction manager.
>
>> doesn't this also imply that the prepare-treshold isn't the spec's way? I.e. even though TM doesn't call any method on the RM , the RM returns a new XID in the result of XAResource.recover when the threshold is reached.
>
> The definition of what constitutes an 'in-doubt' tx for purposes of inclusion in the recovery list is not well defined by the spec.
>
> In a world where there is only one TM driving the RM and that TM performs recovery before starting up and running new tx, no new tx will be added to the recovery list. In the real world new ones are being added continuously as the system is always under load.
>
> The concern is more around the stability of the list with respect to not *removing* things except in response to TM activity. i.e. if a node goes away, should you return a cached snapshot of its tx for list stability, or exclude them? both options carry risks.
Thanks for the clarification.
>
>
> Jonathan.
>
> --
> ------------------------------------------------------------
> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20110106/38d29bba/attachment.html
More information about the infinispan-dev
mailing list