[wildfly-dev] A look at Eclipse MicroProfile Healthcheck

Jeff Mesnil jmesnil at redhat.com
Thu Jul 6 09:37:31 EDT 2017


> On 6 Jul 2017, at 15:00, David Lloyd <dlloyd at redhat.com> wrote:
> 
>> To better integrate WildFly with OpenShift, we should provide a way to let OpenShift checks the healthiness of WildFly. The MPHC spec is a good candidate to provide such feature.
>> It is worth exploring how we could leverage it for user deployments and also for WildFly internals (when that makes sense).
>> Swarm is providing an implementation of the MPHC, we also need to see how we can collaborate between WildFly and Swarm to avoid duplicating code and efforts from providing the same feature to our users.
> 
> I like the idea of having a WildFly health API that can bridge to MPHC
> via a subsystem; this is consistent with what we've done in other
> areas.  I'm not so sure about having (more?) APIs which drive
> services.  It might be better to use cap/req to have a health
> capability to which other systems can be registered.  This might allow
> multiple independent health check resources to be defined, for systems
> which perform more than one function; downstream health providers
> could reference the resource(s) to register with by capability name.

You are right.
If we provide our own health API, it will rely on req/cap to bind everything.
My idea was to provide an API that hides the req/cap plumbing but is built on top of it.
It’d be similar to what I’m doing in the messaging-activemq subsystem where I almost always hides the installation of service in static install() methods such as [1] that requires only some parameters and hides all the dependencies/capabilities service names and injection)

[1]https://github.com/wildfly/wildfly/blob/master/messaging-activemq/src/main/java/org/wildfly/extension/messaging/activemq/HTTPUpgradeService.java#L91

> Is this a polling-only service, or is there a "push" mechanism?

Polling only.
The container (OpenShift) will call the HTTP endpoint regularly to check the application healthiness.

> Just brainstorming, I can think of a few more potentially useful
> health checks beyond what you've listed:
> 
> • EJB failure rate (if an EJB starts failing more than some percentage
> of the last, say 50 or 100 invocations, it could report an "unhealthy"
> condition)
> • Database failure rate (something with JDBC exceptions maybe)

That one is interesting.
I proposed a health check that pings a JDBC connection to Heiko when we talked about the API and he told me that might be a bad idea after all.
If the database fails, the application will not function as expected. But restarting the application will not make the problem goes away (it’s likely the DB that has a problem).
Having health checks that cross service boundaries (such as "my app" <—> “DB”) may have a snowballing effect where one unhealthy service (the DB) would propagate its unhealthiness to other services (“my app”).
In that case, the DB should be probed and restarted asap but there is nothing that should be done in the app server.

We would need guidelines to determine which health checks actually makes sense for WildFly extensions.

Caucho has an interesting list of health checks[1] that could make sense for WildFly.
There is the usual suspects (memory, CPU) and some more interesting ones:
* JVM deadlock check
* transaction failure rate

We’d have to be careful implementing a JVM deadlock health check though.
These health checks should not impact the app server runtime too much and should be fast (by default Kubernetes has a 1 second timeout for its liveness probe).

jeff

[1] http://www.caucho.com/resin-4.0/admin/health-checking.xtp#Defaulthealthconfiguration

-- 
Jeff Mesnil
JBoss, a division of Red Hat
http://jmesnil.net/




More information about the wildfly-dev mailing list