[jboss-jira] [JBoss JIRA] (WFWIP-176) Pod restarted because of failing liveness/rediness Probe

Mon Aug 19 05:40:00 EDT 2019

     [ https://issues.jboss.org/browse/WFWIP-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Choma updated WFWIP-176:
-------------------------------
    Description: 
During testing 73 image I came to case where really corner case is tested [0].

Test is not using templates for deployment. 
In tested scenario liveness/readiness probe fails. In CD 17 and eap 73 pod is restarted. In CD 16 however, there was no liveness/rediness failures in events. Pod was not restarted. 

I dont see differences in pod yaml for CD16 case
{code}
      livenessProbe:
        exec:
          command:
            - /bin/bash
            - '-c'
            - /opt/eap/bin/livenessProbe.sh
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      name: weirdusername
      readinessProbe:
        exec:
          command:
            - /bin/bash
            - '-c'
            - /opt/eap/bin/readinessProbe.sh
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
{code}

and CD 17 case
{code}
     livenessProbe:
        exec:
          command:
            - /bin/bash
            - '-c'
            - /opt/eap/bin/livenessProbe.sh
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      name: weirdusername
      readinessProbe:
        exec:
          command:
            - /bin/bash
            - '-c'
            - /opt/eap/bin/readinessProbe.sh
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
{code}

What could change this behaviour change? 

[0] https://issues.jboss.org/browse/CLOUD-1988

  was:
During testing 73 image I came to case where really corner case is tested [0].
I believe the recent change about probes timeouts [1] change the behaviour little bit. And I wnat to make sure it is expected.

In tested scenario liveness/readiness probe fails. In CD 17 and eap 73 pod is restarted. In CD 16 however pod was not restarted. Is this expected? 

Is this scenario real?: User didnt care so far about liveness/readiness probes - they were failing. Application is working. After migrating to CD17/EAP73 image pod will be restarting and application will be down.

[0] https://issues.jboss.org/browse/CLOUD-1988
[1] https://issues.jboss.org/browse/CLOUD-3248

> Pod restarted because of failing liveness/rediness Probe
> --------------------------------------------------------
>
>                 Key: WFWIP-176
>                 URL: https://issues.jboss.org/browse/WFWIP-176
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: OpenShift
>            Reporter: Martin Choma
>            Assignee: Ken Wills
>            Priority: Major
>
> During testing 73 image I came to case where really corner case is tested [0].
> Test is not using templates for deployment. 
> In tested scenario liveness/readiness probe fails. In CD 17 and eap 73 pod is restarted. In CD 16 however, there was no liveness/rediness failures in events. Pod was not restarted. 
> I dont see differences in pod yaml for CD16 case
> {code}
>       livenessProbe:
>         exec:
>           command:
>             - /bin/bash
>             - '-c'
>             - /opt/eap/bin/livenessProbe.sh
>         failureThreshold: 3
>         periodSeconds: 10
>         successThreshold: 1
>         timeoutSeconds: 1
>       name: weirdusername
>       readinessProbe:
>         exec:
>           command:
>             - /bin/bash
>             - '-c'
>             - /opt/eap/bin/readinessProbe.sh
>         failureThreshold: 3
>         periodSeconds: 10
>         successThreshold: 1
>         timeoutSeconds: 1
> {code}
> and CD 17 case
> {code}
>      livenessProbe:
>         exec:
>           command:
>             - /bin/bash
>             - '-c'
>             - /opt/eap/bin/livenessProbe.sh
>         failureThreshold: 3
>         periodSeconds: 10
>         successThreshold: 1
>         timeoutSeconds: 1
>       name: weirdusername
>       readinessProbe:
>         exec:
>           command:
>             - /bin/bash
>             - '-c'
>             - /opt/eap/bin/readinessProbe.sh
>         failureThreshold: 3
>         periodSeconds: 10
>         successThreshold: 1
>         timeoutSeconds: 1
> {code}
> What could change this behaviour change? 
> [0] https://issues.jboss.org/browse/CLOUD-1988

--
This message was sent by Atlassian Jira
(v7.12.1#712002)