I spent most of this week testing for race conditions related to the
change in behaviour in 4.2.x where a transaction timeout issues a
rollback() rather than setRollbackOnly().
I've found a number of problems:
1) The initial problem where a local resource adapter
was effectively allowing the user to continue
"in the next transaction" after the rollback.
http://jira.jboss.com/jira/browse/JBAS-5080
I initially fixed this last week by checking the status
of the transaction association before allowing
operations.
But this was incomplete, since there was
still a potential race where the timeout
could happen between the check and
the operation.
Thread 1: Check tx status (ok)
Thread 2: Rollback
Thread 1: do operation (status is now bad)
This has now been properly fixed by introducing
a lock such that an action, e.g. an sql update
will check the status and complete the operation
before allowing the rollback (and vice versa).
Race won by Thread 1:
Thread 1: lock
Thread 2: try to lock
Thread 1: do operation
Thread 1: unlock
Thread 2: lock
Thread 2: rollback
etc.
Race won by Thread 2:
Thread 2: lock
Thread 1: try to lock
Thread 2: rollback
Thread 2: unlock
Thread 1: lock
Thread 1: check status -> failure
etc.
2) There is a race in the TxConnectionManager
between closing connections and tx rollback.
http://jira.jboss.com/jira/browse/JBAS-5095
I've put a fix in the JCA pool that avoids
the problem, but it still needs fixing
properly in the connection manager.
3) While doing this testing I discovered
what looks like a race in the transaction
manager.
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4114750#...
I found this when I was trying to emulate
the asynchronous tx timeout -> rollback().
All in all, I think we need to a lot more
testing (and thinking of possible other races?)
around this changed semantic.
Here are two other bugs it caused:
http://jira.jboss.com/jira/browse/JBAS-4487
http://viewvc.jboss.org/cgi-bin/viewvc.cgi/jbossas?view=rev&revision=...
And there's still been no work on fixing the
noisy logging :-(
http://jira.jboss.com/jira/browse/JBAS-3633
i.e. lots of stacktraces for one problem -
a transaction timeout.
--
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Adrian Brock
Chief Scientist
JBoss, a division of Red Hat
xxxxxxxxxxxxxxxxxxxxxxxxxxxx