FYI, a discussion I have with Jonathan around recovery support from TM
On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
On 01/05/2011 02:18 PM, Mircea Markus wrote:
> I don't know how the TM recovery process picks up the XAResource instance on
which to call XAResource.recover, but I imagine it expects this method to return all the
prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e.
from the entire cluster.
all from the logical RM, which you happen to implement as a cluster, yes.
> I'm asking this because right now there's no way for a node to know all the
prepared transaction in the entire cluster. This is doable but would involve an broadcast
to query the cluster, which might be costly (time and bandwidth).
right. not to mention it should, strictly speaking, block or fail if any node is
unreachable, which kinda sucks from an availability perspective.
Keep in mind the global list of known in-doubt tx is a point in time snapshot anyhow - in
an active system it's out of date as soon as another running tx is prepared. So, you
can serve one that's slightly stale without too much risk. Not that periodic broadcast
of the in-doubt list between nodes is necessarily better than doing it on-demand in
response to a recovery call, but at least it's O(1) rather than O(number of clients
calling recover). The (mild) problem we've seen in the past is where a large cluster
of app server nodes i.e. tx managers, is started more or less simultaniously, the RMs get
a storm of recovery requests every two minutes. If the impl of that is expensive and not
cached it can cause performance spikes in the RM.
That said, keep in mind a recovery pass is (in the default config) only every two minutes
and run on a background thread. It's not something you want to worry about performance
optimization of in the initial implementation. Do it right rather than fast. Optimize it
only when users scream.
> On the other hand I imagine this call is performed asynchronously and doesn't
impact TM performance in managing ongoing transactions, so it might not be that bad after
all.
correct
> Another approach would be to to consider each node as an transaction branch.
you mean a logically separate resource manager, yes. You are basically talking about not
doing interposition in the driver/server but rather relying on the transaction manager to
handle multiple resources. It may make your implementation simpler but probably less
performant on the critical path (transaction commit) vs. recovery.
> The advantage here is that recovery can be easily implemented, as the TM recovery
would ask all the nodes that were registered for prepared transactions
'registered'? you plan of having the config on each transaction manager contain a
list of all cluster nodes? That's not admin friendly.
> , and no cluster broadcast would be required when XAResource.recover is called.
the broadcast is automatic. Maintaining the list of known nodes in a config file is not.
No contest.
> Considering this approach, do you see any drawbacks compared with the other one?
> E.g. each node being a branch might involve multiple RPC between remote TM and
XAResource on each node (v.s. one in prev example).
yeah, as mentioned above its a non-interposed model rather than one where your
driver/server is doing interposition. Favour the one that makes the commit path fast, even
though it makes recovery an utter pain.
Jonathan.
-
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor,
Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael
Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)