]
Martin Simka edited comment on WFWIP-206 at 10/1/19 9:46 AM:
-------------------------------------------------------------
update:
* failure in {{testTxStatelessServerSecondCommitThrowRmFail}} -seems to be caused by old
EAP image missing some fixes, I'll retest it with fresh image from EAP7-1216- with new
image it fails with similar symptoms as {{testTxStatelessClientSecondCommitThrowRmFail}},
so WFWIP-218
* {{testTxStatelessClientSecondCommitThrowRmFail}} fails intermittently, I created
WFWIP-218
* {{testTxStatelessServerSecondPrepareJvmHalt}} pending retest
* {{testTxStatelessClientSecondPrepareJvmHalt}} pending retest
was (Author: simkam):
update:
* failure in {{testTxStatelessServerSecondCommitThrowRmFail}} seems to be caused by old
EAP image missing some fixes, I'll retest it with fresh image from EAP7-1216
* {{testTxStatelessClientSecondCommitThrowRmFail}} fails intermittently, I created
WFWIP-218
* {{testTxStatelessServerSecondPrepareJvmHalt}} pending retest
* {{testTxStatelessClientSecondPrepareJvmHalt}} pending retest
scale down isn't successful when there are in-doubt transaction
on pod
----------------------------------------------------------------------
Key: WFWIP-206
URL:
https://issues.jboss.org/browse/WFWIP-206
Project: WildFly WIP
Issue Type: Bug
Components: OpenShift
Environment: operator, built from
[
1984d98154f11a7473be065e4ae7f54b1812c9b0|https://github.com/wildfly/wildf...]
:
{noformat}
docker-registry.engineering.redhat.com/jbossqe-eap/wildfly-operator:latest
{noformat}
EAP image:
{noformat}
docker-registry.engineering.redhat.com/ochaloup/wildfly18-snapshot:190909...
{noformat}
Reporter: Martin Simka
Assignee: Ondrej Chaloupka
Priority: Blocker
Labels: operator
While testing tx recovery in OpenShift I see that scale down of pod that has transaction
in-doubt on it isn't successful
Scenario:
*ejb client* (app tx-client, pod tx-client-0):
* EJB business method
** lookup remote EJB
** enlist XA resource 1 to transaction
** enlist XA resource 2 to transaction
** call remote EJB
*ejb server* (app tx-server, pod tx-server-0):
* EJB business method
** enlist XA resource 1 to transaction
** enlist XA resource 2 to transaction
ejb server XA resource 2 fails with {{XAException(XAException.XAER_RMFAIL)}}
Then the test calls scale down (size from 1 to 0) on tx-server pod. But scale down never
completes.
Log from operator:
{noformat}
{"level":"info","ts":1568905676.6303623,"logger":"controller_wildflyserver","msg":"Transaction
recovery scaledown
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
Name":"tx-server-0","IP Address":"172.17.0.10"}
{"level":"info","ts":1568905676.7313502,"logger":"controller_wildflyserver","msg":"Enabling
recovery listener for processing scaledown at
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
{"level":"info","ts":1568905679.4325309,"logger":"controller_wildflyserver","msg":"Query
to find the transaction recovery port to force scan at pod
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
{"level":"info","ts":1568905686.7914035,"logger":"controller_wildflyserver","msg":"Executing
recovery scan at
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
IP":"172.17.0.10","Recovery port":4712}
{"level":"info","ts":1568905702.0583296,"logger":"controller_wildflyserver","msg":"In-doubt
transactions in object
store","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
Name":"tx-server-0","Message":"Recovery scan to be invoked
as the transaction log storage is not empty for pod scaling down pod tx-server-0,
transaction list: map[0:ffffac11000a:991c183:5d8399a1:13:map[age-in-seconds:23
id:0:ffffac11000a:991c183:5d8399a1:13 jmx-name:<nil>
participants:map[java:/MockXAResource:map[eis-product-name:MockXAResource Test
eis-product-version:0.1.Mock jmx-name:<nil> jndi-name:java:/MockXAResource
status:PREPARED type:/StateManager/AbstractRecord/XAResourceRecord]]
type:StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/SubordinateAtomicAction/JCA]]"}
{"level":"info","ts":1568905711.1034026,"logger":"controller_wildflyserver","msg":"Reconciling
WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
{"level":"info","ts":1568905711.103548,"logger":"controller_wildflyserver","msg":"Transaction
recovery scaledown
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
Name":"tx-server-0","IP Address":"172.17.0.10"}
{"level":"info","ts":1568905711.109706,"logger":"controller_wildflyserver","msg":"Executing
recovery scan at
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
IP":"172.17.0.10","Recovery port":4712}
{"level":"error","ts":1568905711.2608829,"logger":"controller_wildflyserver","msg":"Failures
during scaling down recovery
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired
replica size":0,"Number of pods to be
removed":1,"error":"Found 1 errors:\n [[Failed to run transaction
recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error:
Error to get response for command SCAN sending to 172.17.0.10:4712, error:
EOF]],","stacktrace":"github.com/go-logr/zapr.(*zapLo...
{"level":"info","ts":1568905711.2609518,"logger":"controller_wildflyserver","msg":"Scaling
down statefulset by verification if pods are clean by
recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"info","ts":1568905711.2609615,"logger":"controller_wildflyserver","msg":"Statefulset
was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1.
Some pods were not cleaned by recovery. Verify status of the WildflyServer
tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"info","ts":1568905711.2795022,"logger":"controller_wildflyserver","msg":"Updating
StatefulSet to be up to date with the WildFlyServer
Spec","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"info","ts":1568905711.2795491,"logger":"controller_wildflyserver","msg":"Reconciling
WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
{"level":"info","ts":1568905711.2796504,"logger":"controller_wildflyserver","msg":"Transaction
recovery scaledown
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
Name":"tx-server-0","IP Address":"172.17.0.10"}
{"level":"info","ts":1568905711.2937052,"logger":"controller_wildflyserver","msg":"Executing
recovery scan at
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
IP":"172.17.0.10","Recovery port":4712}
{"level":"error","ts":1568905711.294249,"logger":"controller_wildflyserver","msg":"Failures
during scaling down recovery
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired
replica size":0,"Number of pods to be
removed":1,"error":"Found 1 errors:\n [[Failed to run transaction
recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error:
Cannot process TCP connection to 172.17.0.10:4712, error: dial tcp 172.17.0.10:4712:
connect: connection
refused]],","stacktrace":"github.com/go-logr/zapr.(*z...
{"level":"info","ts":1568905711.294342,"logger":"controller_wildflyserver","msg":"Scaling
down statefulset by verification if pods are clean by
recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"info","ts":1568905711.294417,"logger":"controller_wildflyserver","msg":"Statefulset
was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1.
Some pods were not cleaned by recovery. Verify status of the WildflyServer
tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"error","ts":1568905711.311673,"logger":"controller_wildflyserver","msg":"Failed
to Update
StatefulSet.","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server","error":"Operation
cannot be fulfilled on statefulsets.apps \"tx-server\": the object has been
modified; please apply your changes to the latest version and try
again","stacktrace":"github.com/go-logr/zapr.(*zapLog...
{"level":"error","ts":1568905711.311745,"logger":"kubebuilder.controller","msg":"Reconciler
error","controller":"wildflyserver-controller","request":"msimka-namespace/tx-server","error":"Operation
cannot be fulfilled on statefulsets.apps \"tx-server\": the object has been
modified; please apply your changes to the latest version and try
again","stacktrace":"github.com/go-logr/zapr.(*zapLog...
{"level":"info","ts":1568905712.3137681,"logger":"controller_wildflyserver","msg":"Reconciling
WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
{"level":"info","ts":1568905712.3139439,"logger":"controller_wildflyserver","msg":"Transaction
recovery scaledown
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
Name":"tx-server-0","IP Address":"172.17.0.10"}
{"level":"info","ts":1568905712.3253288,"logger":"controller_wildflyserver","msg":"Executing
recovery scan at
tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod
IP":"172.17.0.10","Recovery port":4712}
{"level":"error","ts":1568905712.3255754,"logger":"controller_wildflyserver","msg":"Failures
during scaling down recovery
processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired
replica size":0,"Number of pods to be
removed":1,"error":"Found 1 errors:\n [[Failed to run transaction
recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error:
Cannot process TCP connection to 172.17.0.10:4712, error: dial tcp 172.17.0.10:4712:
connect: connection
refused]],","stacktrace":"github.com/go-logr/zapr.(*z...
{"level":"info","ts":1568905712.3256311,"logger":"controller_wildflyserver","msg":"Scaling
down statefulset by verification if pods are clean by
recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{"level":"info","ts":1568905712.3256419,"logger":"controller_wildflyserver","msg":"Statefulset
was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1.
Some pods were not cleaned by recovery. Verify status of the WildflyServer
tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
{noformat}