[infinispan-dev] XAResource.isSameRM

Mark Little mlittle at redhat.com
Fri Jan 7 07:29:50 EST 2011


On 6 Jan 2011, at 10:45, Mircea Markus wrote:

> 
> On 5 Jan 2011, at 17:19, Jonathan Halliday wrote:
> 
>> On 01/05/2011 03:42 PM, Mircea Markus wrote:
>>> 
>>> On 5 Jan 2011, at 14:51, Mircea Markus wrote:
>>> 
>>>> FYI, a discussion I have with Jonathan around recovery support from TM
>>>> 
>>>> On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
>>>>> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>>>>> 
>>>>>> I don't know how the TM recovery process picks up the XAResource instance on which to call XAResource.recover, but I imagine it expects this method to return all the prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e. from the entire cluster.
>>>>> 
>>>>> all from the logical RM, which you happen to implement as a cluster, yes.
>>>>> 
>>>>>> I'm asking this  because right now there's no way for a node to know all the prepared transaction in the entire cluster. This is doable but would involve an broadcast to query the cluster, which might be costly (time and bandwidth).
>>>>> 
>>>>> right. not to mention it should, strictly speaking, block or fail if any node is unreachable, which kinda sucks from an availability perspective.
>>> So if a node does not respond to the broadcast, it is incorrect to return the prepared transactions received from the other nodes? (is this because the TM expects to receive some tx that it knows for sure to be prepared?) Or would a "best effort" be "good enough"? (e.g. I broadcast the query and return all the results received in 1 sec)
>> 
>> hmm, interesting question.
>> 
>> Keep in mind that the XA spec dates from a time when a typical large clustered RM was 2-3 oracle nodes on the same LAN segment. It simply isn't geared to a world where the number of nodes is so large and widely distributed that the probability of *all* of them being available simultaneously is pretty slim. Likewise the number of transaction managers connected to a resource was assumed to be small, often 1, rather than the large N we see on modern clusters / clouds. As a result, the spec either fails to give guidance on some issues because they weren't significant at the time it was written, or implies/mandates behaviour that is counter productive in modern environments.
>> 
>> Thus IMO some compromises are necessary to make XA usable in the real world, especially at scale. To further complicate matters, these are split across RM and TM, with different vendors having different views on the subject. My advice is geared to the way JBossTS drives XA recovery - other TMs may behave differently and make greater or lesser assumptions about compliance with the letter of the spec. As a result you may find that making your RM work with multiple vendor's TMs requires a) configuration options and b) a lot of painful testing.  Likewise JBossTS contains code paths and config options geared to dealing with bugs or non-compliant behaviour in various vendor's RMs.
>> 
>> Now, on to the specific question: The list returned should, strictly speaking, be complete. There are two problems with that. First, you have to be able to reach all your cluster nodes to build a complete list which, as previously mentioned, is pretty unlikely in a sufficiently large cluster. Your practical strategies are thus as you say: either a) throw an XAException(XAER_RMFAIL) if any node is unreachable within a reasonable timeout and accept that this may mean an unnecessary delay in recovering the subset of tx that are known or b) return a partial list on a best effort basis. The latter approach allows the transaction manager to deal with at least some of the in-doubt tx, which may in turn mean releasing resources/locks in the RM. In general I'd favour that option as having higher practical value in terms of allowing the best possible level of service to be maintained in the face of ongoing failures.

+1

In fact on some mainframe implementations of CICS, for example, it behaves exactly like this.

>> 
>> JBossTS will rescan every N minutes (2 by default) and thus you can simply include any newly discovered in-doubt tx as they become known due to e.g. partitioned nodes rejoining the cluster, and the TM will deal with them when they are first seen. Note however that some TMs assume that if they scan an RM and that RM does not subsequently crash, no new in-doubt transactions will occur except from heuristics. Let's gloss over how they can even detect a crash/recover of the RM if the driver masks it with failover or the event happens during a period when the TM makes no call on the driver. Such a TM will perform a recovery scan once at TM startup and not repeat. In such case you may have in-doubt tx from nodes unavailable at that crucial time subsequently sitting around for a prolonged period, tying up precious resources and potentially blocking subsequent updates. Most RM vendors provide some kind of management capability for admins to view and manually force completion of in!
> -doubt tx. command line tool, jmx, web gui, whatever, just so long as it exists.
> When a node crashes all the transactions that node owns (i.e. tx which were originated on that node and XAResource instance residing on that node) automatically rollback, so that no resources (locks mainly) are held. The only thing we need to make sure though is that the given transaction ids (the one that heuristically rollback) are returned by theXAResource.recover method - doable in the same way we handle prepares. I imagine that we'll have to keep these XIDs until XAResource.forget(XID) is called, am I right? Is it common/possible for people to use TM _without_ recovery? If so,  this "held heuristic completed TX" functionality should be configurable (enabled/disabled) in order to avoid memory leaks (no recovery means .forget never gets called)     

If you're using a transaction manager then use it all. Don't futz about and just use this bit or that bit and still say you're using transactions ;-)

Mark.

---
Mark Little
mlittle at redhat.com

JBoss, by Red Hat
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland).







More information about the infinispan-dev mailing list