[jboss-jira] [JBoss JIRA] (WFWIP-203) Transaction recovery may hit a wrong server when remote side works with multiple pods
Ondrej Chaloupka (Jira)
issues at jboss.org
Mon Sep 16 09:25:01 EDT 2019
Ondrej Chaloupka created WFWIP-203:
--------------------------------------
Summary: Transaction recovery may hit a wrong server when remote side works with multiple pods
Key: WFWIP-203
URL: https://issues.jboss.org/browse/WFWIP-203
Project: WildFly WIP
Issue Type: Bug
Components: OpenShift
Reporter: Ondrej Chaloupka
Assignee: Ondrej Chaloupka
When server to server calls ejb remote calls where transaction context is propagated then ejb call can be routed to a one pod where the recovery call may directed to a different pod.
Such situation causes a consistency issue.
Let's say the scenario: the first server (let's call it `tx-client`) makes remote ejb call to remote server which is on of the servers joint in cluster named `tx-server-0` and `tx-server-1`. The `tx-client` calls the `tx-server-1`. The processing continues up to the start of the 2PC and the `tx-server-1` crashes (or host goes down, network issue happens...).
`tx-client` understands that the process was not succesful and ask recovery manager to retry and finish.
The recovery manager starts to call the remote server based on data saved in the object store of `tx-client`.
But unfortunately the recovery remote call goes *not* to the `tx-server-1` but to `tx-server-0`. The `tx-client` gets error code `XAException.XAER_NOTA` (`-4`) and removes data from its object store (`/opt/eap/standalone/data/tx-object-store/`, `/opt/eap/standalone/data/ejb-xa-recovery`) and then never finishes in-doubt transactions at `tx-server-1`.
It's in doubt if it's issue of OpenShift configuration or if it's a trouble of WFTC/ejb/remoting layer in WildFly.
This is tested with WFLY Operator from 2019-09-26 `@90a2b3b`.
--
This message was sent by Atlassian Jira
(v7.13.5#713005)
More information about the jboss-jira
mailing list