JBoss Community

Re: Remoting Transport Transaction Inflow Design Discussion

created by David Lloyd in JBoss Transactions Development - View the full discussion

Jonathan Halliday wrote:

>> [remote UserTransaction] ... behaves in an intuitive fashion only for a very limited, albeit common, set of use cases. For more complex scenarios its inherent limitations manifest in ways that can be confusing to users.

> or "some complex scenarios"

yup, I can give you a some of them too:

1) The 'client' is actually another AS instance, either of the same or earlier vintage, doing JNDI lookup of UserTransaction against a remote AS7.
2) The client wants to talk to two remote AS instances in the same tx.
3) The client is an environment that has its own UserTransaction implementation. This is actually just a more general version of case 1). but in which you can't use tricks like patching the client side lookup to return your actual UserTransaction instead of the remote proxy.
4) you want to support load balancing or failover for the client-server connection.

Okay so these basically correspond to the same scenarios which have already been outlined. As far as I know there's no need (in terms of existing functionality or explicit requirement) to support #4 during mid-transaction though.

Jonathan Halliday wrote:

>> JCA inflow was either designed for propagation to leaf nodes only, or incredibly badly thought out.

> or "badly thought out"

yup, although it's really pretty obvious: The JCA inflow API uses an Xid as a poor man's transaction propagation context. Xids were designed only for control flow between a transaction manager and a resource manager, not for use in multi-level trees. The JCA has no provision for allowing subordinates to create new branches in the global transaction. For that it would have to pass in a mask of free bits in the bqual array as well as the Xid to the subordinate. Indeed the JCA expressly prohibits the container handling the inflow from altering the Xid. It has to remain immutable because without any knowledge of which bits can safely be mutated, the container can't guarantee to generate uniq Xids, a property which is required by the spec.

I didn't find this in the JCA spec (there was a bit about RMs not altering an XID data bits in transit but this is not the same thing), but I see your point about XID generation in a hierarchical system (it'd be fine as long as there are no cycles and you could just patch on stuff to the end of the branch ID, but that's not technically very robust, and could violate the XID "format" if there is one). I'm curious to know how other vendors solve this problem with EIS transaction inflow. I could see a workaround in which additional XAResources are enlisted to the root controller by propagating them back *up* the chain, but this is back into custom SPI territory which I'd just as soon stay out of.

Alternatively the subordinate TM could simply generate a new global transaction ID for its subordinate resources. It'd technically be a lie but it'd cleanly solve this problem at least as far as transaction completion goes - recovery semantics might be hard to work out though.

Jonathan Halliday wrote:

> or "not capable enough"

The XA spec expects that each resource manager (give or take its XAResource's isSameRM implementation) gets its own branch i.e. uniq Xid. With inflowed Xids you can't generate new Xids to meet that expectation, you have to use the inflowed one verbatim. That causes problems with the state machine for the XA protocol lifecycle, as it's tied to the Xid. For example, if the inflowed tx is used to connect to two resource managers, you can't recover from crashes cleanly as the recovery mechanism is tracking state on the assumption that the Xid belongs to at most one RM and once it has cleaned that one up it's done. Actually on further thought even an upper limit of one is optimistic - the Xid contains the node Id of the originating parent and that parent may connect to the same resource manager, in which case it's going to incorrectly manage the lifecycle because it can't distinguish the XAResource representing the subordinate tx from the one representing the RM as they have the same Xid. That last case is an artifact of our implementation rather than the spec though.

Again I can't find this in the spec. It clearly says that an XID is used to identify the incoming transaction, but nothing says that it cannot in turn generate different XIDs for its own resources.

As for your latter point though, recalling that we're dealing with a strictly hierarchical relationship here; even if the same transaction recursively flows in to a node into which it had already flowed, it doesn't really have to treat it as another branch of the same transaction, even if it were possible to do so. It's a departure from CORBA-style distribution in that every inflow can be a new level in the transaction hierarchy even if it passes through the same node (which you would not normally do in a hierarchical relationship, by definition, because resources could then be accessed from two wholly different XIDs even if they are logically a part of the same transaction). If true distribution is desired, there's always JTS, after all. That's what this is - you trade away the functionality you don't want anyway when you're in a client/server environment, and in return you get much simpler semantics (and in turn, less overhead) and the benefits of the optimized transport. Choices are good.

Jonathan Halliday wrote:

> or "unintuitive behavior"

yup, I can give you one for that too - the afterCompletions run relative to the commit in the local node where they are registered, which may actually be before the commit in another node and not correctly reflect heuristics outcomes or be suitable for triggering subsequent steps in a process that depend on running after commits in the other nodes. Likewise beforeCompletions run relative to the prepare in the local node, thus may run after a prepare in another node. In the best case that's merely inefficient, in the worst case, where resource managers are shared, it causes a flush of cached data to occur after a prepare, which will fail. It that's not complicated enough for you, take the inflowed transaction context and make a transactional call back to the originating parent server. fun, fun.

I woudn't be worried about the Synchronization stuff in a multi-tier environment - especially if we disallow resource sharing (i.e. treat each node's access to a resource as separate), which seems prudent given my above thoughts about unorthodox XID handling. In my experience, the use cases for the kind of boss/subordinate cascading which we are talking about would generally not rely on the ability (resource sharing) anyway. And if you're not sharing resources then if you look at the synchronization issues, you'll see that their semantics probably only matter relative to what the local node can see anyway. I think this lack of capability is fair if it saves us implementation effort.

That isn't to say that we couldn't invent some great new SPI which does this all much better. Given unlimited (or less limited) resources, this would be fine by me. Furthermore since all of this XATerminator/XAResource stuff is implementation details, we could do it one way now and change to a different, more feature-rich solution later on. Maybe at the same time we can tackle the XID deficiency in the JCA spec somehow.

Jonathan Halliday wrote:

> You're basically saying that an MDB can never access more than one resource. That's a major problem in and of itself.

Not at all. MDBs don't normally run in inflowed transactions. The server hosting the MDB container starts a top level transaction, enlists the JMS as a resource manager and additionally enlists any resource managers the MDB calls e.g. a database. It's a flat structure, not a hierarchic one.

The purpose is execute Work in the context of a transaction controlled by an outside party, and delivering messages as part of an imported transaction is allowed and described in the spec as one of the three models (with respect to transactions) in which messages my be delivered.

In any case, if that API was intended for flat execution then yeah it's an utter failure of an SPI. If it's intended for hierarchical execution then it's only a moderate failure (due to the XID problem), one that's actually workable in practice (in my opinion). Without resources to control, after all, there's not a lot of point to transactional inflow.

Jonathan Halliday wrote:

> Finally "unacceptable to ship a solution that may require manual transaction cleanup" - you should know that any two-phase transaction system may require manual transaction cleanup; that's the nature of two-phase transactions.

sure, but they are the small number of outcomes that result from one or more of the players not behaving in accordance with the spec e.g. resource managers making autonomous outcome decisions. We don't automatically do anything about those because we simply can't - that's the point at which the spec basically says 'give up, throw a heuristic and let a human deal with the mess'. You're talking about the much more numerous expected failure cases that can be handled under automatically under the spec. Indeed exactly the kinds of run of the mill system failures a distributed transaction protocol is designed to protect a user against. Intentionally shipping a non spec compliant XAResource implementation that will result in a support case for many of those common failures is borderline business suicide, see above.

The whole idea is predicated on complying with the XAResource contract; we would not intentionally ship a non spec compliant XAResource implementation.

Jonathan Halliday wrote:

> I'm pretty sure that if someone unplugs the ethernet cable of the transaction coordinator after prepare but before commit, there's going to have to be some manual cleanup.

Really? Got a test case for that? Other than the one a certain competitor wrote and we soundly refuted as FUD? Because I've got an extensive test suite that shows no such outcomes. Well, except for MS SQLServer and mysql, neither of which is fully XA compliant at present. Ensuring clean transaction completion in crash situations is exactly what the transaction manager is for after all.

Okay, great. What I was trying to get across with those requirement items is that we're only going to implement the contracts, and we're not implementing any special recovery semantics beyond what the contracts specify and what the TM does for us. If the TM can handle every crash scenario ever, all the better.

Reply to this message by going to Community

Start a new discussion in JBoss Transactions Development at Community