No that I know that I should write about Cattle :) I want to pick this
conversation
up again.
In my Docker setup
http://pilhuhn.blogspot.de/2016/06/using-hawkular-services-via-docker.html
I have a WildFly in a container ("hawkfly") which I can start
with `docker run pilhuhn/hawkfly`. When the container stops, I could
restart it (docker start or by
using a --restart=always policy).
Or I just start a new one with `docker run` as above. Actually I can
start dozens that way
to scale up and down.
In the later case I end up with dozens of 'dead' wildfly servers in
Hawkular
inventory (and thus also in MiQ inventory, as the inventory sync can't
remove them if they are still present in Hawkular inventory).
Before I continue I want to list a few use cases that I identified
a) plain WildFly on bare metal/VM. Always same instance, gets started
and stopped many times, keeps state
This is probably what we always did and do (= a pet)
b) WildFly in container
b1) managed by some orchestration system (= cattle )
b2) started is some more ad-hoc way (e.g. docker-compose, manual
docker-run commands) (= in between cattle and pets)
Now we also need to keep in mind that for applications e.g. a bunch of
app servers may
run the same code for load balancing / fault tolerance reasons.
Now about the relationship to our inventory
For a) it is pretty clear that users see the individual App-servers as
long-living installation. When it crashes,
they restart it, but it stays the same install. So we can easily list
and keep it in inventory and also have
some command to be "manually" to clean up once the user decides that
that installation is really no longer needed.
Also as the AS (usually) has full access to the file system, the feed-id
is preserved over restarts.
For b) the situation is different, as images are immutable and
containers have some "local storage", that
is valid only for the same container. So the WF can only remember its
feed id on restarts of the same
container, but not when I docker run a new one as replacement for an old
one.
Now with both Docker and (most probably) also k8s it is possible to get
a stream of events when
containers are stopped and removed. So we could connect to them and make
use of the information.
The other aspect here is what do we do with the collected data. Right
now out approach is very
pet-centric, which is fine for use case a) above.
For containers, we don't want to have dead cattle in inventory for
forever. We may also not want to remove
the collected metrics, as they can be still important.
For these user cases we should probably abstract this to the level of a
flock. We want to monitor
the size of the flocks and also when flock members die and are born, but
no longer try to identify
individual members. We brand them to denote being part of the flock.
With the flock we can still have individual cattle report their metrics
individually, but we
need to have a way to aggregate over the flock. Similar for alerting
purposes, where we
set up the alert definitions on the flock and not individual members.
Alert definitions need
then be applied to all new members.
For inventory, we can when we learn about a member dying just remove if
from the flock
and adjust counters, record an event about it. Similar for new members.
Now the question is how do we learn about cattle in the flock? When
building images
with the agent inside, we can pass an env-variable or agent setting that
a) indicates that this is an agent inside a docker container
b) indicates which flock this belongs to.
Does that make sense?
Heiko
On 3 Jul 2016, at 14:14, John Mazzitelli wrote:
In case you didn't understand the analogy, I believe Heiko meant
to
use the word "Cattle" not "Kettle" :-)
I had to look it up - I've not heard the "cattle vs. pets" analogy
before - but I get it now!
----- Original Message -----
> Hey,
>
> [ CC to Federico as he may have some ideas from the Kube/OS side ]
>
> Our QE has opened an interesting case:
>
>
https://github.com/ManageIQ/manageiq/issues/9556
>
> where I first thought WTF with that title.
>
> But then when reading further it got more interesting.
> Basically what happens is that especially in environments like
> Kube/Openshift,
> individual containers/appservers are Kettle and not Pets: one goes
> down,
> gets
> killed, you start a new one somewhere else.
>
> Now the interesting question for us are (first purely on the Hawkular
> side):
> - how can we detect that such a container is down and will never come
> up
> with that id again (-> we need to clean it up in inventory)
> - can we learn that for a killed container A, a freshly started
> container A' is
> the replacement to e.g. continue with performance monitoring of the
> app
> or to re-associate relationships with other items in inventory-
> (Is that even something we want - again that is Kettle and not Pets
> anymore)
> - Could eap+embedded agent perhaps store some token in Kube which
> is then passed when A' is started so that A' knows it is the new A
> (e.g.
> feed id).
> - I guess that would not make much sense anyway, as for an app
> with
> three app servers all would get that same token.
>
> Perhaps we should ignore that use case for now completely and tackle
> that differently in the sense that we don't care about 'real' app
> servers,
> but rather introduce the concept of a 'virtual' server where we only
> know
> via Kube that it exists and how many of them for a certain
> application
> (which is identified via some tag in Kube). Those virtual servers
> deliver
> data, but we don't really try to do anything with them 'personally',
> but indirectly via Kube interactions (i.e. map the incoming data to
> the
> app and not to an individual server). We would also not store
> the individual server in inventory, so there is no need to clean it
> up (again, no pet but kettle).
> In fact we could just use the feed-id as kube token (or vice versa).
> We still need a way to detect that one of those kettle-as is on Kube
> and possibly either disable to re-route some of the lifecycle events
> onto Kubernetes (start in any case, stop probably does not matter
> if he container dies because the appserver inside stops or if kube
> just kills it).