[
https://issues.redhat.com/browse/WFWIP-218?page=com.atlassian.jira.plugin...
]
Ondrej Chaloupka commented on WFWIP-218:
----------------------------------------
Ok, after time of investigation I was able to reproduce the issue in standalone
(non-kubernetes) setup (it happened to be a little bit complicated - at least for me - to
understand what's happening here). The updated testcase is at:
https://gitlab.mw.lab.eng.bos.redhat.com/ochaloup/tests-transactions/comm...
the way to run is e.g.
{code}
export JBOSS_HOME=...
mvn clean verify -am -pl jbossts -DfailIfNoTests=false -Djbossts.noJTS
-Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit
{code}
The issue is caused by the fact that he recovery `commit` call has no information about
the file system. The top-down-recovery runs with {{AtomicActionRecoveryModule}} which has
no notion about {{XA}} specifics of the {{XAResource}}. The module just takes the
serialized version of the resource and run commit on it. When the read of the serialized
resource is done there is only information about the remote resource and thus the commit
may succeed. The removal of the file of xa-recovery happens only at the second periodic
recovery attempt when bottom-up {{XAResourceRecoveryModule}} calls {{recover}} and removes
the record.
This is a trouble when the remote server is already away (as it's for the scaledown)
and the recover call keeps failing. That way there is no way for the WFTC to remove data
on the second round of recovery during the {{XAResource.recover()}} call. The recover call
fails forever and the WFTC xa recovery records resides at the place forever as well.
server scale down keeps data in client's data/ejb-xa-recovery and
transactions on client aren't commited
--------------------------------------------------------------------------------------------------------
Key: WFWIP-218
URL:
https://issues.redhat.com/browse/WFWIP-218
Project: WildFly WIP
Issue Type: Bug
Components: OpenShift
Reporter: Martin Simka
Assignee: Ondrej Chaloupka
Priority: Major
this follows up on WFWIP-206
While testing tx recovery in OpenShift I see that scale down of pod that has transaction
in-doubt on it isn't successful
Scenario:
*ejb client* (app tx-client, pod tx-client-0):
* EJB business method
** lookup remote EJB
** enlist XA resource 1 to transaction
** enlist XA resource 2 to transaction
** call remote EJB
*ejb server* (app tx-server, pod tx-server-0):
* EJB business method
** enlist XA resource 1 to transaction
** enlist XA resource 2 to transaction
*testTxStatelessServerSecondCommitThrowRmFail*
ejb server XA resource 2 fails with {{XAException(XAException.XAER_RMFAIL)}}
Then the test calls scale down (size from 1 to 0) on tx-server pod. Server scale down
completes but sometimes there some records left in
{{<JBOSS_HOME>/standalone/data/ejb-xa-recovery}} on tx-client and transactions on
client aren't commited.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)