RfC: Remove / rename servers from / in inventory

how to identify UI clients and...

[Metrics] HWKMETRICS-125 Use JBoss...

Heiko W.Rupp

Tuesday, 15 September 2015 Tue, 15 Sep '15

10:01 a.m.

Hey, currently we can add (WF) servers to inventory easily, but we can't get rid of them anymore. Similarly it is not really clear what should happen if a serve changes its name. In WildFly 10 supposedly a new uuid kind of thing should appear, that would identify each WF uniquely and which could serve as the "primary key", so that the name is just a display attribute that can be changed at will, so that the renaming piece is sort of solved. Now to the semantics of "uninventory" (not deletion of physical servers). In RHQ we had uninventory that removed the server from inventory. After a discovery scan happened, it showed up again and could be re-inventoried. During uninventory, all collected metrics etc. and alert definitions for the resource were wiped (or prepared for wiping). This (wiping of recorded metrics) has more than one time been mentioned as undesired e.g. in case that the uninventory happened due to bad luck. Also in RHQ there were issues when e.g. the machine where a WF-server was running on got renamed. Some questions that come to mind: - Should we wipe data when a server gets uninventoried? - How can we prevent it from entering inventory directly after an uninventory again? - what states do we need (data wiping may take too long to be done synchronously)? - would users perhaps want a data dump to external storage before wiping? - How can we make sure that servers without that uuid can still be identified even after rename (of the machine)? - How can we make sure that servers without that uuid can still be identified even when they are moved to a different pod Heiko

Show replies by date

Jay Shaughnessy

Tuesday, 15 September Tue, 15 Sep

12:52 p.m.

On 9/15/2015 11:01 AM, Heiko W.Rupp wrote:

...

Before answering the questions below. I'm not so sure uninventory should be a thing. In RHQ it meant a wipe of server-side data. In hawkular I wonder whether it just means turning off the feed of that data and letting it disappear from the recent time windows. Filtering, if necessary, should maybe be handled in the presentation layer. Having a feed be able to report the set of resources it *can* report on, and then supplying it a list of the resources it *should* report on, seems like a general mechanism we could somehow build out. Or, it could be less granular, just down to type level. The initial set of types should be small.

...

Some questions that come to mind: - Should we wipe data when a server gets uninventoried?

No. I think data should just live on until some TTL perhaps kicks in. I do sometimes wonder how important really old data is. Maybe useful for trend analysis but not so much for problem resolution, which is probably more reliant on say the last 12-48 hours.

...

- How can we prevent it from entering inventory directly after an uninventory again?

If we are able to supply a feed with a list of types/resources to not report on then it's not a problem.

...

- what states do we need (data wiping may take too long to be done synchronously)?

Don't wipe data, let it be purged later and just be irrelevant based on the query windows being looked at.

...

- would users perhaps want a data dump to external storage before wiping?

Not a problem if we keep the data.

...

- How can we make sure that servers without that uuid can still be identified even after rename (of the machine)?

I think this is a problem to ignore. If the feed name/resource path changes then it's basically new resource. I don't think we need to be this complex. If we maintain our feed name I think the resource path should remain the same for the resources it is monitoring. If we don't do that already then we should look at doing that.

...

- How can we make sure that servers without that uuid can still be identified even when they are moved to a different pod

Maybe the same as above, I'm not sure.

Heiko W.Rupp

2:12 p.m.

New subject: Remove / rename servers from / in inventory

On 15 Sep 2015, at 19:52, Jay Shaughnessy wrote:

...

No. I think data should just live on until some TTL perhaps kicks in.

That would work for metrics (out of the box). What about other data like alert trigger definitions? Or group memberships?

...

I think this is a problem to ignore. If the feed name/resource path changes then it's basically new resource. I don't think

Do I understand you correctly that you say that a feed name once set will be the same no matter where the (WF-)server moves to? So that a resource under that feed would be the same no matter where the feed is located. For a WF-Server with the embedded agent, this is certainly true. I may may be wrong here: in OpenShift3 / Kubernetes (K8s), K8s may kill a container at any time and start it on a different machine. If an application consist of e.g. 3 servers, having one killed and restarted would imply to me that all the previous "settings" would still apply to the freshly started instance no matter where it is then located. In this case, we should probably record the fact of the restart, but the resource would at least logically (from my PoV) be the same one of 3 servers of that app. (of course that last paragraph does not apply to explicit manual uninventory). I also recall from RHQ that we had a situation where a user was deploying my-super-app-v1.war and later my-super-app-v2.war, where both deployments are the same app and even with a different name just two different versions of the same resource and not different resources. Also in this case, the user wants all metrics, trigger definitions etc. to survive the deployment of v2.

Jay Shaughnessy

3:04 p.m.

New subject: Remove / rename servers from / in inventory

On 9/15/2015 3:12 PM, Heiko W.Rupp wrote:

...

On 15 Sep 2015, at 19:52, Jay Shaughnessy wrote: > No. I think data should just live on until some TTL perhaps kicks in. That would work for metrics (out of the box). What about other data like alert trigger definitions? Or group memberships?

Perhaps we look at a reaper sort of approach where we look for "dead" resources in inventory, based on some sort of criteria, and then perform clean-up.

...

> I think this is a problem to ignore. If the feed name/resource path > changes then it's basically new resource. I don't think Do I understand you correctly that you say that a feed name once set will be the same no matter where the (WF-)server moves to? So that a resource under that feed would be the same no matter where the feed is located. For a WF-Server with the embedded agent, this is certainly true. I may may be wrong here: in OpenShift3 / Kubernetes (K8s), K8s may kill a container at any time and start it on a different machine. If an application consist of e.g. 3 servers, having one killed and restarted would imply to me that all the previous "settings" would still apply to the freshly started instance no matter where it is then located. In this case, we should probably record the fact of the restart, but the resource would at least logically (from my PoV) be the same one of 3 servers of that app.

I defer to others to answer this question but if feedIds don't survive a machine change I think we're in trouble. I rather thought a feed would just report it's machine in some way, so that machine metrics could be correlated to app performance.

...

(of course that last paragraph does not apply to explicit manual uninventory). I also recall from RHQ that we had a situation where a user was deploying my-super-app-v1.war and later my-super-app-v2.war, where both deployments are the same app and even with a different name just two different versions of the same resource and not different resources. Also in this case, the user wants all metrics, trigger definitions etc. to survive the deployment of v2.

You are right about this and I hadn't forgot about it. But this feature is a b*tch. It was really hard in RHQ and may be nigh impossible in Hawkular unless we basically do the same thing and start stripping versions out of resource paths. Or maybe we're already doing that, I don't know. It depends on whether the agent is using the path as-is from wfly, which afaik is still using the artifact name in the path.

Larry O'Leary

9:10 p.m.

New subject: Remove / rename servers from / in inventory

On Tue, Sep 15, 2015 at 3:04 PM, Jay Shaughnessy <jshaughn(a)redhat.com> wrote:

...

On 9/15/2015 3:12 PM, Heiko W.Rupp wrote: > On 15 Sep 2015, at 19:52, Jay Shaughnessy wrote: >> No. I think data should just live on until some TTL perhaps kicks in. > That would work for metrics (out of the box). > What about other data like alert trigger definitions? > Or group memberships? Perhaps we look at a reaper sort of approach where we look for "dead" resources in inventory, based on some sort of criteria, and then perform clean-up.

...

From my PoV the data should just be kept and the legacy concept of

uninventory should just become a hybrid of disable and ignore. Collection of new data should not occur as the resource has been "de-activated". Alert trigger defs and group membership should be made invisible. All historic data is retained until its TTL expires. And all configuration defs/settings, collection schedules and the like, would just remain Because the resource is "disabled" re-discovery shouldn't be an issue. If the user wants to "re-activate" it, then everything remains intact. The downside to this would be that a user could not "reset" the resource and start from scratch. But, I am not really sure that this is a problem assuming default configuration/settings/templates can be reapplied.

...

> I also recall from RHQ that we had a situation where a user was > deploying my-super-app-v1.war and later my-super-app-v2.war, > where both deployments are the same app and even with a different > name just two different versions of the same resource and not > different resources. Also in this case, the user wants all metrics, > trigger definitions etc. to survive the deployment of v2. You are right about this and I hadn't forgot about it. But this feature is a b*tch. It was really hard in RHQ and may be nigh impossible in Hawkular unless we basically do the same thing and start stripping versions out of resource paths. Or maybe we're already doing that, I don't know. It depends on whether the agent is using the path as-is from wfly, which afaik is still using the artifact name in the path.

I think it would be best just to provide a resource reconciliation function that allows a user to perform operation such as: - Resource A and Resource B are the same - Merge Resource A and Resource B as Resource A | B | C - Resource B is version 2 of Resource A Because resources would no longer be deleted/removed from inventory, it allows the merging of updated/new resources without any issue -- assuming the types are the same. So, from your original questions: Some questions that come to mind:

...

- Should we wipe data when a server gets uninventoried?

No. - How can we prevent it from entering inventory directly after an

...

uninventory again?

If it is simply disabled/ignored, we won't have to. Only need a way to allow the user to allow it to be "re-activated". This would be similar to what was done in RHQ with re-discovery.

...

- what states do we need (data wiping may take too long to be done synchronously)?

No necessary. Data should just expire on its own based on data retention settings. - would users perhaps want a data dump to external storage before

...

wiping?

Yes. For the resource that was "de-activated" the user should still be able to get to its data/metric views and grab the data that has not yet expired.

...

- How can we make sure that servers without that uuid can still be identified even after rename (of the machine)? - How can we make sure that servers without that uuid can still be identified even when they are moved to a different pod

As Jay indicated in his response, I am not sure this is really a problem. Wouldn't a new UUID equal a new resource? Assuming there was a reconciliation feature, the user could re-link a new UUID to an existing resource with a different UUID if they are confident that it is the same resource. It would be preferred that this could happen automatically, but ideally the user knows best and should have the ability to link/merge the resources. -- Larry O'Leary https://plus.google.com/+LarryOLeary

Lukas Krejci

Thursday, 17 September Thu, 17 Sep

10:14 a.m.

New subject: Remove / rename servers from / in inventory

I agree 100% with what Jay and Larry suggest here, but I want to point out one other thing that Jay mentioned on the last water-cooler call. One of the problems with discovery and (un)inventory in RHQ was that by default we discovered and inventoried A LOT. This then required some users to go back and uninventory unnecessary stuff and ignore it in the discovery q. What Jay suggested and I thought was a brilliant idea was to "start small" and only discover a very "crude picture" of the resource tree with a hint of what could be discovered in addition (i.e. discover an EAP resource with some basic stats, no child resources like deployments, subsystems, etc with the hint of what could be discovered coming from the hierarchical resource types (granted those are not in inventory right now, but that is easily changed)). On Tuesday, September 15, 2015 21:10:14 Larry O'Leary wrote:

...

On Tue, Sep 15, 2015 at 3:04 PM, Jay Shaughnessy <jshaughn(a)redhat.com> wrote: > On 9/15/2015 3:12 PM, Heiko W.Rupp wrote: > > On 15 Sep 2015, at 19:52, Jay Shaughnessy wrote: > >> No. I think data should just live on until some TTL perhaps kicks in. > > > > That would work for metrics (out of the box). > > What about other data like alert trigger definitions? > > Or group memberships? > > Perhaps we look at a reaper sort of approach where we look for "dead" > resources in inventory, based on some sort of criteria, and then perform > clean-up. > >From my PoV the data should just be kept and the legacy concept of uninventory should just become a hybrid of disable and ignore. Collection of new data should not occur as the resource has been "de-activated". Alert trigger defs and group membership should be made invisible. All historic data is retained until its TTL expires. And all configuration defs/settings, collection schedules and the like, would just remain Because the resource is "disabled" re-discovery shouldn't be an issue. If the user wants to "re-activate" it, then everything remains intact. The downside to this would be that a user could not "reset" the resource and start from scratch. But, I am not really sure that this is a problem assuming default configuration/settings/templates can be reapplied. > > I also recall from RHQ that we had a situation where a user was > > deploying my-super-app-v1.war and later my-super-app-v2.war, > > where both deployments are the same app and even with a different > > name just two different versions of the same resource and not > > different resources. Also in this case, the user wants all metrics, > > trigger definitions etc. to survive the deployment of v2. > > You are right about this and I hadn't forgot about it. But this feature > is a b*tch. It was really hard in RHQ and may be nigh impossible in > Hawkular unless we basically do the same thing and start stripping > versions out of resource paths. Or maybe we're already doing that, I > don't know. It depends on whether the agent is using the path as-is from > wfly, which afaik is still using the artifact name in the path. I think it would be best just to provide a resource reconciliation function that allows a user to perform operation such as: - Resource A and Resource B are the same - Merge Resource A and Resource B as Resource A | B | C - Resource B is version 2 of Resource A Because resources would no longer be deleted/removed from inventory, it allows the merging of updated/new resources without any issue -- assuming the types are the same. So, from your original questions: Some questions that come to mind: > - Should we wipe data when a server gets uninventoried? No. - How can we prevent it from entering inventory directly after an > uninventory again? If it is simply disabled/ignored, we won't have to. Only need a way to allow the user to allow it to be "re-activated". This would be similar to what was done in RHQ with re-discovery. > - what states do we need (data wiping may take too long to be done > synchronously)? No necessary. Data should just expire on its own based on data retention settings. - would users perhaps want a data dump to external storage before > wiping? Yes. For the resource that was "de-activated" the user should still be able to get to its data/metric views and grab the data that has not yet expired. > - How can we make sure that servers without that uuid can still be > identified even after rename (of the machine)? > - How can we make sure that servers without that uuid can still be > identified even when they are moved to a different pod As Jay indicated in his response, I am not sure this is really a problem. Wouldn't a new UUID equal a new resource? Assuming there was a reconciliation feature, the user could re-link a new UUID to an existing resource with a different UUID if they are confident that it is the same resource. It would be preferred that this could happen automatically, but ideally the user knows best and should have the ability to link/merge the resources.

3852

days inactive

3854

days old

hawkular-dev@lists.jboss.org

Manage subscription

5 comments

4 participants

tags (0)

participants (4)

Heiko W.Rupp
Jay Shaughnessy
Larry O'Leary
Lukas Krejci

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

RfC: Remove / rename servers from / in inventory