[Hawkular-dev] Identification of WildFly in container in a Kube/Openshift env

Fri Jul 22 11:30:08 EDT 2016

No that I know that I should write about Cattle :) I want to pick this 
conversation
up again.

In my Docker setup 
http://pilhuhn.blogspot.de/2016/06/using-hawkular-services-via-docker.html
I have a WildFly in a container ("hawkfly") which I can start
with `docker run pilhuhn/hawkfly`.  When the container stops, I could 
restart it (docker start or by
using a --restart=always policy).
Or I just start a new one with `docker run` as above. Actually I can 
start dozens that way
to scale up and down.

In the later case I end up with dozens of 'dead' wildfly servers in 
Hawkular
inventory (and thus also in MiQ inventory, as the inventory sync can't
remove them if they are still present in Hawkular inventory).

Before I continue I want to list a few use cases that I identified
a) plain WildFly on bare metal/VM. Always same instance, gets started 
and stopped many times, keeps state
   This is probably what we always did and do (= a pet)
b) WildFly in container
b1) managed by some orchestration system (= cattle )
b2) started is some more ad-hoc way (e.g. docker-compose, manual 
docker-run commands) (= in between cattle and pets)

Now we also need to keep in mind that for applications e.g. a bunch of 
app servers may
run the same code for load balancing / fault tolerance reasons.

Now about the relationship to our inventory
For a) it is pretty clear that users see the individual App-servers as 
long-living installation. When it crashes,
they restart it, but it stays the same install. So we can easily list 
and keep it in inventory and also have
some command to be "manually" to clean up once the user decides that 
that installation is really no longer needed.
Also as the AS (usually) has full access to the file system, the feed-id 
is preserved over restarts.

For b) the situation is different, as images are immutable and 
containers have some "local storage", that
is valid only for the same container. So the WF can only remember its 
feed id on restarts of the same
container, but not when I docker run a new one as replacement for an old 
one.

Now with both Docker and (most probably) also k8s it is possible to get 
a stream of events when
containers are stopped and removed. So we could connect to them and make 
use of the information.

The other aspect here is what do we do with the collected data. Right 
now out approach is very
pet-centric, which is fine for use case a) above.

For containers, we don't want to have dead cattle in inventory for 
forever. We may also not want to remove
the collected metrics, as they can be still important.
For these  user cases we should probably abstract this to the level of a 
flock. We want to monitor
the size of the flocks and also when flock members die and are born, but 
no longer try to identify
individual members. We brand them to denote being part of the flock.

With the flock we can still have individual cattle report their metrics 
individually, but we
need to have a way to aggregate over the flock. Similar for alerting 
purposes, where we
set up the alert definitions on the flock and not individual members. 
Alert definitions need
then be applied to all new members.
For inventory, we can when we learn about a member dying just remove if 
from the flock
and adjust counters, record an event about it. Similar for new members.

Now the question is how do we learn about cattle in the flock? When 
building images
with the agent inside, we can pass an env-variable or agent setting that
a) indicates that this is an agent inside a docker container
b) indicates which flock this belongs to.

Does that make sense?
   Heiko

On 3 Jul 2016, at 14:14, John Mazzitelli wrote:

> In case you didn't understand the analogy, I believe Heiko meant to 
> use the word "Cattle" not "Kettle" :-)
>
> I had to look it up - I've not heard the "cattle vs. pets" analogy 
> before - but I get it now!
>
> ----- Original Message -----
>> Hey,
>>
>> [ CC to Federico as he may have some ideas from the Kube/OS side ]
>>
>> Our QE has opened an interesting case:
>>
>> https://github.com/ManageIQ/manageiq/issues/9556
>>
>> where I first thought WTF with that title.
>>
>> But then when reading further it got more interesting.
>> Basically what happens is that especially in environments like
>> Kube/Openshift,
>> individual containers/appservers are Kettle and not Pets: one goes 
>> down,
>> gets
>> killed, you start a new one somewhere else.
>>
>> Now the interesting question for us are (first purely on the Hawkular
>> side):
>> - how can we detect that such a container is down and will never come 
>> up
>> with that id again (-> we need to clean it up in inventory)
>> - can we learn that for a killed container A, a freshly started
>> container A' is
>> the replacement to e.g. continue with performance monitoring of the 
>> app
>> or to re-associate relationships with other items in inventory-
>> (Is that even something we want - again that is Kettle and not Pets
>> anymore)
>> - Could eap+embedded agent perhaps store some token in Kube which
>> is then passed when A' is started so that A' knows it is the new A 
>> (e.g.
>> feed id).
>>    - I guess that would not make much sense anyway, as for an app 
>> with
>>     three app servers all would get that same token.
>>
>> Perhaps we should ignore that use case for now completely and tackle
>> that differently in the sense that we don't care about 'real' app
>> servers,
>> but rather introduce the concept of a 'virtual' server where we only
>> know
>> via Kube that it exists and how many of them for a certain 
>> application
>> (which is identified via some tag in Kube). Those virtual servers
>> deliver
>> data, but we don't really try to do anything with them 'personally',
>> but indirectly via Kube interactions (i.e. map the incoming data to 
>> the
>> app and not to an individual server). We would also not store
>> the individual server in inventory, so there is no need to clean it
>> up (again, no pet but kettle).
>> In fact we could just use the feed-id as kube token (or vice versa).
>> We still need a way to detect that one of those kettle-as is on Kube
>> and possibly either disable to re-route some of the lifecycle events
>> onto Kubernetes (start in any case, stop probably does not matter
>> if he container dies because the appserver inside stops or if kube
>> just kills it).