On 5 Jan 2011, at 14:51, Mircea Markus wrote:
FYI, a discussion I have with Jonathan around recovery support from
TM
On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>
>> I don't know how the TM recovery process picks up the XAResource instance on
which to call XAResource.recover, but I imagine it expects this method to return all the
prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e.
from the entire cluster.
>
> all from the logical RM, which you happen to implement as a cluster, yes.
>
>> I'm asking this because right now there's no way for a node to know all
the prepared transaction in the entire cluster. This is doable but would involve an
broadcast to query the cluster, which might be costly (time and bandwidth).
>
> right. not to mention it should, strictly speaking, block or fail if any node is
unreachable, which kinda sucks from an availability perspective.
So if a node does
not respond to the broadcast, it is incorrect to return the prepared transactions received
from the other nodes? (is this because the TM expects to receive some tx that it knows for
sure to be prepared?) Or would a "best effort" be "good enough"? (e.g.
I broadcast the query and return all the results received in 1 sec)
>
> Keep in mind the global list of known in-doubt tx is a point in time snapshot anyhow
- in an active system it's out of date as soon as another running tx is prepared. So,
you can serve one that's slightly stale without too much risk. Not that periodic
broadcast of the in-doubt list between nodes is necessarily better than doing it on-demand
in response to a recovery call, but at least it's O(1) rather than O(number of clients
calling recover).
Interesting. I would expect a single (periodic)
XAResource.recover call per cluster(assuming the cluster is the RM) / transaction manager.
Am I wrong?
> The (mild) problem we've seen in the past is where a large
cluster of app server nodes i.e. tx managers, is started more or less simultaniously, the
RMs get a storm of recovery requests every two minutes. If the impl of that is expensive
and not cached it can cause performance spikes in the RM.
Good to know this,
definitely an approach to consider.
>
> That said, keep in mind a recovery pass is (in the default config) only every two
minutes and run on a background thread. It's not something you want to worry about
performance optimization of in the initial implementation. Do it right rather than fast.
Optimize it only when users scream.
+1. AS is one of the use cases for ISPN though,
but e.g. if a TM uses Hotrod to interact with the cluster(when HR will support tx) then
there will only be one recovery call/cluster.
>
>> On the other hand I imagine this call is performed asynchronously and doesn't
impact TM performance in managing ongoing transactions, so it might not be that bad after
all.
>
> correct
>
>> Another approach would be to to consider each node as an transaction branch.
>
> you mean a logically separate resource manager, yes. You are basically talking about
not doing interposition in the driver/server but rather relying on the transaction manager
to handle multiple resources. It may make your implementation simpler but probably less
performant on the critical path (transaction commit) vs. recovery.
I don't know
exactly how many RPC's happen in this approach, with TM handling multiple resources. I
imagine the TM would do a XAResource.prepare for each of the nodes involved. In this
XAResource.prepare call I would have to implement the logic of going remotely to each
involved node. Then the same for XAResource.commit. Is that so? (If so then this is pretty
much what we already do in ISPN when it comes to commit/rollback).
On of the advantages of allowing TM to handle each individual node is that we can benefit
from some nice TM features like read-only optimisation or 1PC for single participants
(these are to be implemented anyway in ISPN).
>
>
>> The advantage here is that recovery can be easily implemented, as the TM recovery
would ask all the nodes that were registered for prepared transactions
>
> 'registered'?
wrong word - I ment enlisted
> you plan of having the config on each transaction manager contain
a list of all cluster nodes? That's not admin friendly.
>
>> , and no cluster broadcast would be required when XAResource.recover is called.
>
> the broadcast is automatic. Maintaining the list of known nodes in a config file is
not. No contest.
>
>> Considering this approach, do you see any drawbacks compared with the other one?
>> E.g. each node being a branch might involve multiple RPC between remote TM and
XAResource on each node (v.s. one in prev example).
>
> yeah, as mentioned above its a non-interposed model rather than one where your
driver/server is doing interposition. Favour the one that makes the commit path fast, even
though it makes recovery an utter pain.
+1
>
> Jonathan.
>
> -
> ------------------------------------------------------------
> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor,
Berkshire, SI4 1TE, United Kingdom.
> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael
Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev