[jboss-jira] [JBoss JIRA] (WFWIP-203) Transaction recovery may hit a wrong server when remote side works with multiple pods

Ondrej Chaloupka (Jira) issues at jboss.org
Tue Sep 17 03:25:00 EDT 2019


    [ https://issues.jboss.org/browse/WFWIP-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785125#comment-13785125 ] 

Ondrej Chaloupka commented on WFWIP-203:
----------------------------------------

The issues seems could be caused by the wrong set-up of statefulset in the operator - https://github.com/wildfly/wildfly-operator/pull/71

> Transaction recovery may hit a wrong server when remote side works with multiple pods
> -------------------------------------------------------------------------------------
>
>                 Key: WFWIP-203
>                 URL: https://issues.jboss.org/browse/WFWIP-203
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>            Reporter: Ondrej Chaloupka
>            Assignee: Ondrej Chaloupka
>            Priority: Blocker
>
> When server to server calls ejb remote calls where transaction context is propagated then ejb call can be routed to a one pod where the recovery call may directed to a different pod.
> Such situation causes a consistency issue.
> Let's say the scenario: the first server (let's call it `tx-client`) makes remote ejb call to remote server which is on of the servers joint in cluster named `tx-server-0` and `tx-server-1`. The `tx-client` calls the `tx-server-1`. The processing continues up to the start of the 2PC and the `tx-server-1` crashes (or host goes down, network issue happens...).
> `tx-client` understands that the process was not succesful and ask recovery manager to retry and finish.
> The recovery manager starts to call the remote server based on data saved in the object store of `tx-client`.
> But unfortunately the recovery remote call goes *not* to the `tx-server-1` but to `tx-server-0`. The `tx-client` gets error code `XAException.XAER_NOTA` (`-4`) and removes data from its object store (`/opt/eap/standalone/data/tx-object-store/`, `/opt/eap/standalone/data/ejb-xa-recovery`) and then never finishes in-doubt transactions at `tx-server-1`.
> It's in doubt if it's issue of OpenShift configuration or if it's a trouble of WFTC/ejb/remoting layer in WildFly.
> This is tested with WFLY Operator from 2019-09-26 `@90a2b3b`.



--
This message was sent by Atlassian Jira
(v7.13.5#713005)


More information about the jboss-jira mailing list