[jboss-jira] [JBoss JIRA] (WFWIP-218) server scale down keeps data in client's data/ejb-xa-recovery and transactions on client aren't commited

Ondrej Chaloupka (Jira) issues at jboss.org
Mon Dec 16 07:40:57 EST 2019


    [ https://issues.redhat.com/browse/WFWIP-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934992#comment-13934992 ] 

Ondrej Chaloupka commented on WFWIP-218:
----------------------------------------

Ok, after time of investigation I was able to reproduce the issue in standalone (non-kubernetes) setup (it happened to be a little bit complicated - at least for me - to understand what's happening here). The updated testcase is at:
https://gitlab.mw.lab.eng.bos.redhat.com/ochaloup/tests-transactions/commits/WFWIP-218-rmfail-on-remote-commit
the way to run is e.g.
{code}
export JBOSS_HOME=...
mvn clean verify -am -pl jbossts -DfailIfNoTests=false -Djbossts.noJTS -Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit
{code}

The issue is caused by the fact that he recovery `commit` call has no information about the file system. The top-down-recovery runs with {{AtomicActionRecoveryModule}} which has no notion about {{XA}} specifics of the {{XAResource}}. The module just takes the serialized version of the resource and run commit on it. When the read of the serialized resource is done there is only information about the remote resource and thus the commit may succeed. The removal of the file of xa-recovery happens only at the second periodic recovery attempt when bottom-up {{XAResourceRecoveryModule}} calls {{recover}} and removes the record.
This is a trouble when the remote server is already away (as it's for the scaledown) and the recover call keeps failing. That way there is no way for the WFTC to remove data on the second round of recovery during the {{XAResource.recover()}} call. The recover call fails forever and the WFTC xa recovery records resides at the place forever as well.

> server scale down keeps data in client's data/ejb-xa-recovery and transactions on client aren't commited
> --------------------------------------------------------------------------------------------------------
>
>                 Key: WFWIP-218
>                 URL: https://issues.redhat.com/browse/WFWIP-218
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>            Reporter: Martin Simka
>            Assignee: Ondrej Chaloupka
>            Priority: Major
>
> this follows up on WFWIP-206
> While testing tx recovery in OpenShift I see that scale down of pod that has transaction in-doubt on it isn't successful
> Scenario:
> *ejb client* (app tx-client, pod tx-client-0):
> * EJB business method
>   ** lookup remote EJB 
>   ** enlist XA resource 1 to transaction
>   ** enlist XA resource 2 to transaction
>   ** call remote EJB
> *ejb server* (app tx-server, pod tx-server-0):
> * EJB business method
>   **  enlist XA resource 1 to transaction
>   ** enlist XA resource 2 to transaction
> *testTxStatelessServerSecondCommitThrowRmFail*
> ejb server XA resource 2 fails with {{XAException(XAException.XAER_RMFAIL)}}
> Then the test calls scale down (size from 1 to 0) on tx-server pod. Server scale down completes but sometimes there some records left in {{<JBOSS_HOME>/standalone/data/ejb-xa-recovery}} on tx-client and transactions on client aren't commited.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the jboss-jira mailing list