[JBoss JIRA] (WFWIP-218) server scale down keeps data in client's data/ejb-xa-recovery and transactions on client aren't commited

Monday, 16 December 2019

    [
https://issues.redhat.com/browse/WFWIP-218?page=com.atlassian.jira.plugin...
] 

Ondrej Chaloupka commented on WFWIP-218:
----------------------------------------

Ok, after time of investigation I was able to reproduce the issue in standalone
(non-kubernetes) setup (it happened to be a little bit complicated - at least for me - to
understand what's happening here). The updated testcase is at:
https://gitlab.mw.lab.eng.bos.redhat.com/ochaloup/tests-transactions/comm...
the way to run is e.g.
{code}
export JBOSS_HOME=...
mvn clean verify -am -pl jbossts -DfailIfNoTests=false -Djbossts.noJTS
-Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit
{code}

The issue is caused by the fact that he recovery `commit` call has no information about
the file system. The top-down-recovery runs with {{AtomicActionRecoveryModule}} which has
no notion about {{XA}} specifics of the {{XAResource}}. The module just takes the
serialized version of the resource and run commit on it. When the read of the serialized
resource is done there is only information about the remote resource and thus the commit
may succeed. The removal of the file of xa-recovery happens only at the second periodic
recovery attempt when bottom-up {{XAResourceRecoveryModule}} calls {{recover}} and removes
the record.
This is a trouble when the remote server is already away (as it's for the scaledown)
and the recover call keeps failing. That way there is no way for the WFTC to remove data
on the second round of recovery during the {{XAResource.recover()}} call. The recover call
fails forever and the WFTC xa recovery records resides at the place forever as well.

...
 server scale down keeps data in client's data/ejb-xa-recovery and
transactions on client aren't commited

--------------------------------------------------------------------------------------------------------

                 Key: WFWIP-218
                 URL: https://issues.redhat.com/browse/WFWIP-218
             Project: WildFly WIP
          Issue Type: Bug
          Components: OpenShift
            Reporter: Martin Simka
            Assignee: Ondrej Chaloupka
            Priority: Major

 this follows up on WFWIP-206
 While testing tx recovery in OpenShift I see that scale down of pod that has transaction
in-doubt on it isn't successful
 Scenario:
 *ejb client* (app tx-client, pod tx-client-0):
 * EJB business method
   ** lookup remote EJB 
   ** enlist XA resource 1 to transaction
   ** enlist XA resource 2 to transaction
   ** call remote EJB
 *ejb server* (app tx-server, pod tx-server-0):
 * EJB business method
   **  enlist XA resource 1 to transaction
   ** enlist XA resource 2 to transaction
 *testTxStatelessServerSecondCommitThrowRmFail*
 ejb server XA resource 2 fails with {{XAException(XAException.XAER_RMFAIL)}}
 Then the test calls scale down (size from 1 to 0) on tx-server pod. Server scale down
completes but sometimes there some records left in
{{<JBOSS_HOME>/standalone/data/ejb-xa-recovery}} on tx-client and transactions on
client aren't commited. 

--
This message was sent by Atlassian Jira
(v7.13.8#713008)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006