[infinispan-dev] XAResource.isSameRM

Wed Jan 5 10:42:33 EST 2011

On 5 Jan 2011, at 14:51, Mircea Markus wrote:

> FYI, a discussion I have with Jonathan around recovery support from TM 
> 
> On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
>> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>> 
>>> I don't know how the TM recovery process picks up the XAResource instance on which to call XAResource.recover, but I imagine it expects this method to return all the prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e. from the entire cluster.
>> 
>> all from the logical RM, which you happen to implement as a cluster, yes.
>> 
>>> I'm asking this  because right now there's no way for a node to know all the prepared transaction in the entire cluster. This is doable but would involve an broadcast to query the cluster, which might be costly (time and bandwidth).
>> 
>> right. not to mention it should, strictly speaking, block or fail if any node is unreachable, which kinda sucks from an availability perspective.
So if a node does not respond to the broadcast, it is incorrect to return the prepared transactions received from the other nodes? (is this because the TM expects to receive some tx that it knows for sure to be prepared?) Or would a "best effort" be "good enough"? (e.g. I broadcast the query and return all the results received in 1 sec)  
>> 
>> Keep in mind the global list of known in-doubt tx is a point in time snapshot anyhow - in an active system it's out of date as soon as another running tx is prepared. So, you can serve one that's slightly stale without too much risk. Not that periodic broadcast of the in-doubt list between nodes is necessarily better than doing it on-demand in response to a recovery call, but at least it's O(1) rather than O(number of clients calling recover).  
Interesting. I would expect a single (periodic) XAResource.recover call per cluster(assuming the cluster is the RM) / transaction manager. Am I wrong?
>> The (mild) problem we've seen in the past is where a large cluster of app server nodes i.e. tx managers, is started more or less simultaniously, the RMs get a storm of recovery requests every two minutes. If the impl of that is expensive and not cached it can cause performance spikes in the RM.
Good to know this, definitely  an approach to consider. 
>> 
>> That said, keep in mind a recovery pass is (in the default config) only every two minutes and run on a background thread. It's not something you want to worry about performance optimization of in the initial implementation. Do it right rather than fast. Optimize it only when users scream.
+1. AS is one of the use cases for ISPN though, but e.g. if a TM uses Hotrod to interact with the cluster(when HR will support tx)  then there will only be one recovery call/cluster.   
>> 
>>> On the other hand I imagine this call is performed asynchronously and doesn't impact TM performance in managing ongoing transactions, so it might not be that bad after all.
>> 
>> correct
>> 
>>> Another approach would be to to consider each node as an transaction branch.
>> 
>> you mean a logically separate resource manager, yes. You are basically talking about not doing interposition in the driver/server but rather relying on the transaction manager to handle multiple resources. It may make your implementation simpler but probably less performant on the critical path (transaction commit) vs. recovery.
I don't know exactly how many RPC's happen in this approach, with TM handling multiple resources. I imagine the TM would do a XAResource.prepare for each of the nodes involved. In this XAResource.prepare call I would have to implement the logic of going remotely to each involved node. Then the same for XAResource.commit. Is that so? (If so then this is pretty much what we already do in ISPN when it comes to commit/rollback).
On of the advantages of allowing TM to handle each individual node is that we can benefit from some nice TM features like read-only optimisation or 1PC for single participants (these are to be implemented anyway in ISPN).
>> 
>> 
>>> The advantage here is that recovery can be easily implemented, as the TM recovery would ask all the nodes that were registered for prepared transactions
>> 
>> 'registered'?
wrong word - I ment enlisted
>> you plan of having the config on each transaction manager contain a list of all cluster nodes? That's not admin friendly.
>> 
>>> , and no cluster broadcast would be required when XAResource.recover is called.
>> 
>> the broadcast is automatic. Maintaining the list of known nodes in a config file is not. No contest.
>> 
>>> Considering this approach, do you see any drawbacks compared with the other one?
>>> E.g. each node being a branch might involve multiple RPC between remote TM and XAResource on each node (v.s. one in prev example).
>> 
>> yeah, as mentioned above its a non-interposed model rather than one where your driver/server is doing interposition. Favour the one that makes the commit path fast, even though it makes recovery an utter pain.
+1
>> 
>> Jonathan.
>> 
>> -
>> ------------------------------------------------------------
>> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
>> Registered in UK and Wales under Company Registration No. 3798903  Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev