[JBoss JIRA] (WFWIP-222) when client is scaled down with in-doubt transactions, tx participants on server are resolved with delay
by Ondrej Chaloupka (Jira)
[ https://issues.jboss.org/browse/WFWIP-222?page=com.atlassian.jira.plugin.... ]
Ondrej Chaloupka closed WFWIP-222.
----------------------------------
Resolution: Rejected
The behaviour is expected and the transaction resolution is in hands of different component than the recovery. The transactions on the server were started but was not moved to the {{PREPARED}} state. They are in-flight stored only in memory. It's responsibility of the transaction reaper to finish them. It really does so but the reaper is not dependent on the recovery processing. The reaper roll-backs the transactions after transaction timeout is over. If the reaper is not capable to do so for the remote resources (e.g. WildFly goes down meanwhile) then the database/jms/... will roll-back the transaction after the timeout on its own (aka. prepare was not called -> no promise from the remote resource on the processing was done -> transaction timeouts after the timeout is over)
> when client is scaled down with in-doubt transactions, tx participants on server are resolved with delay
> --------------------------------------------------------------------------------------------------------
>
> Key: WFWIP-222
> URL: https://issues.jboss.org/browse/WFWIP-222
> Project: WildFly WIP
> Issue Type: Bug
> Reporter: Martin Simka
> Assignee: Ondrej Chaloupka
> Priority: Blocker
> Attachments: tx-server.log
>
>
> this follows up on WFWIP-206
> While testing tx recovery in OpenShift I see that when client is scaled down with in-doubt transactions, tx participants on server are resolved with delay.
> Scenario:
> *ejb client* (app tx-client, pod tx-client-0):
> * EJB business method
> ** lookup remote EJB
> ** enlist XA resource 1 to transaction
> ** enlist XA resource 2 to transaction
> ** call remote EJB
> *ejb server* (app tx-server, pod tx-server-0):
> * EJB business method
> ** enlist XA resource 1 to transaction
> ** enlist XA resource 2 to transaction
> *testTxStatelessClientSecondPrepareJvmHalt*
> JVM on client crashes in PREPARE phase of second XA resource on client. Then openshift restarts pod, but pod is immediately scaled down. Transactions on client pod are rollbacked during scale down, but on server they are rollbacked some time later. I'm not sure if it is periodic recovery or tx timeout.
> server log with {{com.arjuna}} trace attached.
> Feel free to reject if it is expected behavior.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (WFWIP-223) disableHTTPRoute = true in runtime does not clean CR.status.hosts
by Jeff Mesnil (Jira)
[ https://issues.jboss.org/browse/WFWIP-223?page=com.atlassian.jira.plugin.... ]
Jeff Mesnil commented on WFWIP-223:
-----------------------------------
fixed upstream in https://github.com/wildfly/wildfly-operator/pull/95
> disableHTTPRoute = true in runtime does not clean CR.status.hosts
> -----------------------------------------------------------------
>
> Key: WFWIP-223
> URL: https://issues.jboss.org/browse/WFWIP-223
> Project: WildFly WIP
> Issue Type: Bug
> Components: OpenShift
> Reporter: Martin Choma
> Assignee: Jeff Mesnil
> Priority: Blocker
> Labels: operator
>
> Reproducer:
> # create CR
> {code}
> apiVersion: wildfly.org/v1alpha1
> kind: WildFlyServer
> metadata:
> generation: 1
> name: eap-cd
> namespace: default
> spec:
> applicationImage: >-
> brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/jboss-eap-7-tech-preview/eap-cd-openshift-rhel8:17.0-4
> size: 1
> disableHTTPRoute: false
> {code}
> # Edit CR with disableHTTPRoute: true. I would expect Route object will be deleted.
> {code}
> apiVersion: wildfly.org/v1alpha1
> kind: WildFlyServer
> metadata:
> generation: 1
> name: eap-cd
> namespace: default
> spec:
> applicationImage: >-
> brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/jboss-eap-7-tech-preview/eap-cd-openshift-rhel8:17.0-4
> size: 1
> disableHTTPRoute: true
> {code}
> # Route object eap-cd is not deleted but status.hosts is still filled
> {code:yaml}
> apiVersion: wildfly.org/v1alpha1
> kind: WildFlyServer
> metadata:
> creationTimestamp: '2019-10-01T17:20:34Z'
> generation: 2
> name: eap-cd
> namespace: mchoma
> resourceVersion: '629943'
> selfLink: /apis/wildfly.org/v1alpha1/namespaces/mchoma/wildflyservers/eap-cd
> uid: c7535f3f-e46f-11e9-b4ad-02e6008f3048
> spec:
> applicationImage: >-
> image-registry.openshift-image-registry.svc:5000/eapcd-suite-builds/eap-cd-openshift-rhel8:17.0-4
> disableHTTPRoute: false
> env: []
> envFrom: []
> replicas: 1
> status:
> hosts:
> - >-
> eap-cd-route-mchoma.apps.eap-qe-cluster105.eap-qe-cluster105.fw.rhcloud.com
> pods:
> - name: eap-cd-0
> podIP: 10.131.0.246
> state: ACTIVE
> replicas: 1
> scalingdownPods: 0
> {code}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month
[JBoss JIRA] (WFWIP-218) server scale down keeps data in client's data/ejb-xa-recovery and transactions on client aren't commited
by Tomasz Adamski (Jira)
[ https://issues.jboss.org/browse/WFWIP-218?page=com.atlassian.jira.plugin.... ]
Tomasz Adamski commented on WFWIP-218:
--------------------------------------
The problem is this issue is as follows:
1. transaction log on on server is cleared correctly
2. server's stateful-set is scaled down (pod referenced by client disappears)
3. client checks in-doubt resource but cannot connect to the server
In OpenShift environment with wildfly-operator, we can be sure that if the server's stateful-set was scaled down correctly then it logs must have been cleared. As a result, client records can be discarded. This is not true in general (bare-metal) where the server may go down and still have records in the log.
in the context of above I would propose:
1. workaround - provide the customer with a manual procedure (if connection error to the server's pod occurs but the pod was scaled down correctly remove log records). This is not elegant, but this is an emergency and I don't expect it to be used often.
2. solution - if operating within OpenShift client cannot connect to one of the server's pods, client checks with OpenShift API whether the server's pod was scaled down. If it was, the record can be discarded.
I would suggest downgrading this issue with provided workaround and later work on the target solution (which would require OpenShift integration and has to be researched).
> server scale down keeps data in client's data/ejb-xa-recovery and transactions on client aren't commited
> --------------------------------------------------------------------------------------------------------
>
> Key: WFWIP-218
> URL: https://issues.jboss.org/browse/WFWIP-218
> Project: WildFly WIP
> Issue Type: Bug
> Components: OpenShift
> Reporter: Martin Simka
> Assignee: Ondrej Chaloupka
> Priority: Blocker
>
> this follows up on WFWIP-206
> While testing tx recovery in OpenShift I see that scale down of pod that has transaction in-doubt on it isn't successful
> Scenario:
> *ejb client* (app tx-client, pod tx-client-0):
> * EJB business method
> ** lookup remote EJB
> ** enlist XA resource 1 to transaction
> ** enlist XA resource 2 to transaction
> ** call remote EJB
> *ejb server* (app tx-server, pod tx-server-0):
> * EJB business method
> ** enlist XA resource 1 to transaction
> ** enlist XA resource 2 to transaction
> *testTxStatelessServerSecondCommitThrowRmFail*
> ejb server XA resource 2 fails with {{XAException(XAException.XAER_RMFAIL)}}
> Then the test calls scale down (size from 1 to 0) on tx-server pod. Server scale down completes but sometimes there some records left in {{<JBOSS_HOME>/standalone/data/ejb-xa-recovery}} on tx-client and transactions on client aren't commited.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 1 month