[jboss-jira] [JBoss JIRA] (WFWIP-205) tx recovery intermittently fails after jvm crash

Tue Sep 17 08:44:00 EDT 2019

     [ https://issues.jboss.org/browse/WFWIP-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Simka updated WFWIP-205:
-------------------------------
    Description: 
While testing tx recovery in OpenShift I see that recovery after JVM crash intermittently fails

Scenario:

*ejb client* (app tx-client, pod tx-client-0):
* EJB business method
  ** lookup remote EJB 
  ** enlist XA resource 1 to transaction
  ** enlist XA resource 2 to transaction
  ** call remote EJB

*ejb server* (app tx-server, pod tx-server-0):
* EJB business method
  **  enlist XA resource 1 to transaction
  ** enlist XA resource 2 to transaction

ejb server XA resource 2 crashes JVM in commit method phase. 

Test waits until crashed pod is restarted, then forces periodic recovery twice and then checks that transaction log store is empty. But it is not empty.

Attached are logs from client and server pods. 

It seems that it can be partially mitigated by clearing openshift namespace before test ({{oc delete all --all}}). But it makes it just less frequent. 

  was:
While testing tx recovery in OpenShift I see that recovery after JVM crash intermittently fails

Scenario:

*ejb client* (app tx-client, pod tx-client-0):
* EJB business method
  ** lookup remote EJB 
  ** enlist XA resource 1 to transaction
  ** enlist XA resource 2 to transaction
  ** call remote EJB

*ejb server* (app tx-server, pod tx-server-0):
* EJB business method
  **  enlist XA resource 1 to transaction
  ** enlist XA resource 2 to transaction

ejb server XA resource 2 crashes JVM in commit method phase. 

Test waits until crashed pod is restarted, then forces periodic recovery twice and then checks that transaction log store is empty. But it is not empty.

Attached are logs from client and server pods. 

It seems that it can be partially mitigated by clear openshift namespace before test ({{oc delete all --all}}). But it makes it just less frequent. 

> tx recovery intermittently fails after jvm crash
> ------------------------------------------------
>
>                 Key: WFWIP-205
>                 URL: https://issues.jboss.org/browse/WFWIP-205
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>         Environment: image: 
> {noformat}
> docker-registry.engineering.redhat.com/ochaloup/wildfly18-snapshot:190909-d4ddf04cc2-wfcore-10.0.0.Beta7-SNAPSHOT
> {noformat}
> operator: 
> {noformat}
> docker-registry.engineering.redhat.com/jbossqe-eap/wildfly-operator:EAP7-1192-txn-recovery-issue70
> {noformat}
> operator built from https://github.com/ochaloup/wildfly-operator/tree/issue70-statefulset-headless-service, head 8925e7f64b6fc02b4694da63d93c0a8ce03a566d)
>            Reporter: Martin Simka
>            Assignee: Ondrej Chaloupka
>            Priority: Blocker
>         Attachments: tx-client-0.log, tx-server-0.log, tx-server-1.log, wildfly-operator-668fd79fb5-8chs8.log
>
>
> While testing tx recovery in OpenShift I see that recovery after JVM crash intermittently fails
> Scenario:
> *ejb client* (app tx-client, pod tx-client-0):
> * EJB business method
>   ** lookup remote EJB 
>   ** enlist XA resource 1 to transaction
>   ** enlist XA resource 2 to transaction
>   ** call remote EJB
> *ejb server* (app tx-server, pod tx-server-0):
> * EJB business method
>   **  enlist XA resource 1 to transaction
>   ** enlist XA resource 2 to transaction
> ejb server XA resource 2 crashes JVM in commit method phase. 
> Test waits until crashed pod is restarted, then forces periodic recovery twice and then checks that transaction log store is empty. But it is not empty.
> Attached are logs from client and server pods. 
> It seems that it can be partially mitigated by clearing openshift namespace before test ({{oc delete all --all}}). But it makes it just less frequent. 

--
This message was sent by Atlassian Jira
(v7.13.5#713005)