JBoss Community

Re: Remoting Transport Transaction Inflow Design Discussion

created by Jonathan Halliday in JBoss Transactions Development - View the full discussion

>> I'm pretty sure that e.g. support will tell you it is unacceptable to ship a solution that may require manual transaction cleanup. We've had a small number of corner cases in JTA that suffered that limitation and eliminating them and the support load they generate has been a high priority for the transaction development work. Intentionally introducing new ones is definitely in the category of Bad Ideas.

> can you be more specific than "Bad Idea"

Sure, explaining the chain of reasoning behind that one is easy:

Red Hat ships products and offers support on them. That support is fixed price based on SLA, not proportional to the number of tickets filed. On the other hand, support costs scale as a function of the number of issues reported. Thus the less support work we have to do, the lower our cost and the higher our profit. Intentionally building and shipping something we know is going to increase support load is contra-survival and therefore a Bad Idea.

>> [remote UserTransaction] ... behaves in an intuitive fashion only for a very limited, albeit common, set of use cases. For more complex scenarios its inherent limitations manifest in ways that can be confusing to users.

> or "some complex scenarios"

yup, I can give you a some of them too:

1) The 'client' is actually another AS instance, either of the same or earlier vintage, doing JNDI lookup of UserTransaction against a remote AS7.

2) The client wants to talk to two remote AS instances in the same tx.

3) The client is an environment that has its own UserTransaction implementation. This is actually just a more general version of case 1). but in which you can't use tricks like patching the client side lookup to return your actual UserTransaction instead of the remote proxy.

4) you want to support load balancing or failover for the client-server connection.

>> JCA inflow was either designed for propagation to leaf nodes only, or incredibly badly thought out.

> or "badly thought out"

yup, although it's really pretty obvious: The JCA inflow API uses an Xid as a poor man's transaction propagation context. Xids were designed only for control flow between a transaction manager and a resource manager, not for use in multi-level trees. The JCA has no provision for allowing subordinates to create new branches in the global transaction. For that it would have to pass in a mask of free bits in the bqual array as well as the Xid to the subordinate. Indeed the JCA expressly prohibits the container handling the inflow from altering the Xid. It has to remain immutable because without any knowledge of which bits can safely be mutated, the container can't guarantee to generate uniq Xids, a property which is required by the spec.

> or "not capable enough"

The XA spec expects that each resource manager (give or take its XAResource's isSameRM implementation) gets its own branch i.e. uniq Xid. With inflowed Xids you can't generate new Xids to meet that expectation, you have to use the inflowed one verbatim. That causes problems with the state machine for the XA protocol lifecycle, as it's tied to the Xid. For example, if the inflowed tx is used to connect to two resource managers, you can't recover from crashes cleanly as the recovery mechanism is tracking state on the assumption that the Xid belongs to at most one RM and once it has cleaned that one up it's done. Actually on further thought even an upper limit of one is optimistic - the Xid contains the node Id of the originating parent and that parent may connect to the same resource manager, in which case it's going to incorrectly manage the lifecycle because it can't distinguish the XAResource representing the subordinate tx from the one representing the RM as they have the same Xid. That last case is an artifact of our implementation rather than the spec though.

> or "unintuitive behavior"

yup, I can give you one for that too - the afterCompletions run relative to the commit in the local node where they are registered, which may actually be before the commit in another node and not correctly reflect heuristics outcomes or be suitable for triggering subsequent steps in a process that depend on running after commits in the other nodes. Likewise beforeCompletions run relative to the prepare in the local node, thus may run after a prepare in another node. In the best case that's merely inefficient, in the worst case, where resource managers are shared, it causes a flush of cached data to occur after a prepare, which will fail. It that's not complicated enough for you, take the inflowed transaction context and make a transactional call back to the originating parent server. fun, fun.

> Also - only one resource for inflowed transactions? How is that not a serious deficiency in our implementation?

It's a deficiency in the JCA spec, see above. The spec assumes the inflowed container is a leaf node i.e. RM, not a subordinate coordinator. There are some hacky things we can potentially do to work around that limitation in the spec without outright breaking compliance. They were on my list of things to do in the transactions upstream, but I seem to be a bit busy with AS integration issues instead :-)

> You're basically saying that an MDB can never access more than one resource. That's a major problem in and of itself.

Not at all. MDBs don't normally run in inflowed transactions. The server hosting the MDB container starts a top level transaction, enlists the JMS as a resource manager and additionally enlists any resource managers the MDB calls e.g. a database. It's a flat structure, not a hierarchic one.

> Finally "unacceptable to ship a solution that may require manual transaction cleanup" - you should know that any two-phase transaction system may require manual transaction cleanup; that's the nature of two-phase transactions.

sure, but they are the small number of outcomes that result from one or more of the players not behaving in accordance with the spec e.g. resource managers making autonomous outcome decisions. We don't automatically do anything about those because we simply can't - that's the point at which the spec basically says 'give up, throw a heuristic and let a human deal with the mess'. You're talking about the much more numerous expected failure cases that can be handled under automatically under the spec. Indeed exactly the kinds of run of the mill system failures a distributed transaction protocol is designed to protect a user against. Intentionally shipping a non spec compliant XAResource implementation that will result in a support case for many of those common failures is borderline business suicide, see above.

> I'm pretty sure that if someone unplugs the ethernet cable of the transaction coordinator after prepare but before commit, there's going to have to be some manual cleanup.

Really? Got a test case for that? Other than the one a certain competitor wrote and we soundly refuted as FUD? Because I've got an extensive test suite that shows no such outcomes. Well, except for MS SQLServer and mysql, neither of which is fully XA compliant at present. Ensuring clean transaction completion in crash situations is exactly what the transaction manager is for after all.

Reply to this message by going to Community

Start a new discussion in JBoss Transactions Development at Community