[jboss-jira] [JBoss JIRA] (WFLY-12952) MP Health returns UP when checks are expected but not installed yet.

Ivan Straka (Jira) issues at jboss.org
Thu Jan 9 11:41:40 EST 2020


     [ https://issues.redhat.com/browse/WFLY-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Straka updated WFLY-12952:
-------------------------------
    Description: 
MicroProfile Health specification [link|https://github.com/eclipse/microprofile-health/blob/master/spec/src/main/asciidoc/protocol-wireformat.adoc] says:
* A producer MUST support custom, application level health check procedures
* A producer SHOULD support reasonable out-of-the-box procedures
* A producer with no health check procedures expected or installed MUST return positive overall status (i.e. HTTP 200)
* A producer with health check procedures expected but not yet installed MUST return negative overall status (i.e. HTTP 503)

When I deploy and application with a readiness probe before WildFly is started, from my and namely OpenShift POV the health check procedure is expected from the very beginning.
_Let me note that on OpenShift starting the served should mean starting the service._
Hence I expect negative overall status till the probe is ready and is able to provide response.

However WildFly with default setting responses with status UP:

{code:bash}
while true; do echo $(date +"%T.%3N") ;  curl   localhost:9990/health/ready; echo ""; done

17:17:56.438 curl: (7) Failed to connect to localhost port 9990: Connection refused
17:17:56.452 curl: (7) Failed to connect to localhost port 9990: Connection refused
17:17:56.466 {"status":"UP","checks":[]}
...
17:18:01.121 {"status":"UP","checks":[]}
17:18:01.133 {"status":"DOWN","checks":[{"name":"delayed-readiness","status":"DOWN"}]}
{code}
This violates (4) bullet in the specification. 

WildFly provides option to set global Status when probes are not defined ([documentation|https://doc-stage.usersys.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.3/html-single/configuration_guide/index#global_statuses_undefined_probes])

Which would mean the scenario would behave according to the specification. Yet the default state violates it.

If the default value were _DOWN_ we would run into an issue if WildFly without deployment were used (for example as backup for AMQ). The status would be just DOWN:
{noformat}
17:22:41.719
{"status":"DOWN","checks":[]}
{noformat}
And that would violate (3) in the specification.
TCK tests do not cover the scenario well.

Is there a way to return _DOWN_ until WildFly scan a deployment and if no health check is found (thus not expected) then start to return _UP_ ? The scan should happen during MP Health initialization. 
If there is no deployment, the _UP_ it is.

Setting the priority to blocker since WildFly 19 shall be EAP 7.3.0.CD19 which is supposed to run on OpenShift. With this behavior health check is not very useful because:
* OpenShift starts the service wait some time and start asking for health status
* WildFly responses _UP_ yet application helathcheck is not installed yet
* With health status _UP_ OpenShift shall proceeds
* In this point application is ready, health status is _DOWN_  (a DB is down) however OpenShift flow is somewhere else

  was:
MicroProfile Health specification [link|https://github.com/eclipse/microprofile-health/blob/master/spec/src/main/asciidoc/protocol-wireformat.adoc] says:
* A producer MUST support custom, application level health check procedures
* A producer SHOULD support reasonable out-of-the-box procedures
* A producer with no health check procedures expected or installed MUST return positive overall status (i.e. HTTP 200)
* A producer with health check procedures expected but not yet installed MUST return negative overall status (i.e. HTTP 503)

When I deploy and application with a readiness probe before WildFly is started, from my and namely OpenShift POV the health check procedure is expected from the very beginning.
_Let me note that on OpenShift starting the served should mean starting the service._
Hence I expect negative overall status till the probe is ready and is able to provide response.

However WildFly with default setting responses with status UP:

{code:bash}
while true; do echo $(date +"%T.%3N") ;  curl   localhost:9990/health/ready; echo ""; done

17:17:56.438 curl: (7) Failed to connect to localhost port 9990: Connection refused
17:17:56.452 curl: (7) Failed to connect to localhost port 9990: Connection refused
17:17:56.466 {"status":"UP","checks":[]}
...
17:18:01.121 {"status":"UP","checks":[]}
17:18:01.133 {"status":"DOWN","checks":[{"name":"delayed-readiness","status":"DOWN"}]}
{code}
This violates (4) bullet in the specification. 

WildFly provides option to set global Status when probes are not defined ([documentation|https://doc-stage.usersys.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.3/html-single/configuration_guide/index#global_statuses_undefined_probes])

Which would mean the scenario would behave according to the specification. Yet the default state violates it.

If the default value were _DOWN_ we would run into an issue if WildFly without deployment were used (for example as backup for AMQ). The status would be just DOWN:
{noformat}
17:22:41.719
{"status":"DOWN","checks":[]}
{noformat}
And that would violate (3) in the specification.
TCK tests do not cover the scenario well.

Is there a way to return _DOWN_ until WildFly scan a deployment and if no health check is found (thus not expected) then start to return _UP_ ? The scan should happen during MP Health initialization. 
If there is no deployment, the _UP_ it is.




> MP Health returns UP when checks are expected but not installed yet.
> --------------------------------------------------------------------
>
>                 Key: WFLY-12952
>                 URL: https://issues.redhat.com/browse/WFLY-12952
>             Project: WildFly
>          Issue Type: Bug
>          Components: MP Health
>    Affects Versions: 18.0.0.Final
>            Reporter: Ivan Straka
>            Assignee: Jeff Mesnil
>            Priority: Blocker
>
> MicroProfile Health specification [link|https://github.com/eclipse/microprofile-health/blob/master/spec/src/main/asciidoc/protocol-wireformat.adoc] says:
> * A producer MUST support custom, application level health check procedures
> * A producer SHOULD support reasonable out-of-the-box procedures
> * A producer with no health check procedures expected or installed MUST return positive overall status (i.e. HTTP 200)
> * A producer with health check procedures expected but not yet installed MUST return negative overall status (i.e. HTTP 503)
> When I deploy and application with a readiness probe before WildFly is started, from my and namely OpenShift POV the health check procedure is expected from the very beginning.
> _Let me note that on OpenShift starting the served should mean starting the service._
> Hence I expect negative overall status till the probe is ready and is able to provide response.
> However WildFly with default setting responses with status UP:
> {code:bash}
> while true; do echo $(date +"%T.%3N") ;  curl   localhost:9990/health/ready; echo ""; done
> 17:17:56.438 curl: (7) Failed to connect to localhost port 9990: Connection refused
> 17:17:56.452 curl: (7) Failed to connect to localhost port 9990: Connection refused
> 17:17:56.466 {"status":"UP","checks":[]}
> ...
> 17:18:01.121 {"status":"UP","checks":[]}
> 17:18:01.133 {"status":"DOWN","checks":[{"name":"delayed-readiness","status":"DOWN"}]}
> {code}
> This violates (4) bullet in the specification. 
> WildFly provides option to set global Status when probes are not defined ([documentation|https://doc-stage.usersys.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.3/html-single/configuration_guide/index#global_statuses_undefined_probes])
> Which would mean the scenario would behave according to the specification. Yet the default state violates it.
> If the default value were _DOWN_ we would run into an issue if WildFly without deployment were used (for example as backup for AMQ). The status would be just DOWN:
> {noformat}
> 17:22:41.719
> {"status":"DOWN","checks":[]}
> {noformat}
> And that would violate (3) in the specification.
> TCK tests do not cover the scenario well.
> Is there a way to return _DOWN_ until WildFly scan a deployment and if no health check is found (thus not expected) then start to return _UP_ ? The scan should happen during MP Health initialization. 
> If there is no deployment, the _UP_ it is.
> Setting the priority to blocker since WildFly 19 shall be EAP 7.3.0.CD19 which is supposed to run on OpenShift. With this behavior health check is not very useful because:
> * OpenShift starts the service wait some time and start asking for health status
> * WildFly responses _UP_ yet application helathcheck is not installed yet
> * With health status _UP_ OpenShift shall proceeds
> * In this point application is ready, health status is _DOWN_  (a DB is down) however OpenShift flow is somewhere else



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the jboss-jira mailing list