[jboss-jira] [JBoss JIRA] (WFWIP-203) Transaction recovery may hit a wrong server when remote side works with multiple pods

Jeff Mesnil (Jira) issues at jboss.org
Fri Sep 20 04:20:01 EDT 2019


     [ https://issues.jboss.org/browse/WFWIP-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Mesnil updated WFWIP-203:
------------------------------
    Labels: operator  (was: )


> Transaction recovery may hit a wrong server when remote side works with multiple pods
> -------------------------------------------------------------------------------------
>
>                 Key: WFWIP-203
>                 URL: https://issues.jboss.org/browse/WFWIP-203
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>            Reporter: Ondrej Chaloupka
>            Assignee: Ondrej Chaloupka
>            Priority: Blocker
>              Labels: operator
>         Attachments: tx-client-0.log, tx-server-0.log, tx-server-1.log
>
>
> When server to server calls ejb remote calls where transaction context is propagated then ejb call can be routed to a one pod where the recovery call may directed to a different pod.
> Such situation causes a consistency issue.
> Let's say the scenario: the first server (let's call it `tx-client`) makes remote ejb call to remote server which is on of the servers joint in cluster named `tx-server-0` and `tx-server-1`. The `tx-client` calls the `tx-server-1`. The processing continues up to the start of the 2PC and the `tx-server-1` crashes (or host goes down, network issue happens...).
> `tx-client` understands that the process was not succesful and ask recovery manager to retry and finish.
> The recovery manager starts to call the remote server based on data saved in the object store of `tx-client`.
> But unfortunately the recovery remote call goes *not* to the `tx-server-1` but to `tx-server-0`. The `tx-client` gets error code `XAException.XAER_NOTA` (`-4`) and removes data from its object store (`/opt/eap/standalone/data/tx-object-store/`, `/opt/eap/standalone/data/ejb-xa-recovery`) and then never finishes in-doubt transactions at `tx-server-1`.
> It's in doubt if it's issue of OpenShift configuration or if it's a trouble of WFTC/ejb/remoting layer in WildFly.
> This is tested with WFLY Operator from 2019-09-26 `@90a2b3b`.



--
This message was sent by Atlassian Jira
(v7.13.5#713005)


More information about the jboss-jira mailing list