sequencing of feed-comm
by John Mazzitelli
Just sending this to document it :) Can we put these plantuml documents somewhere so they can render? github or someplace?
Anyway, this is the current impl - the responses coming back from the agent will pass through directly from the server to the UI websocket - this will not work long term -- see below
This will not work because the UI that submitted the request may not be connected to the same server that the feed is connected to - so we need to put a message on the bus that is the incoming response from the feed and a UI listener will be listening on the bus and send all messages to its UI. This may be difficult to do since UIs do not have an identifier like feeds do (feed ID) - there are session IDs but they change for each websocket connection. I'll figure something out :) Anyway, this is how it should work:
9 years, 2 months
Could not connect to Cassandra ... - does it have to be a WARN with a stack trace?
by Peter Palaga
Hi *,
There are several occurrences of this in every Hawkular start log:
WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle]
(metricsservice-lifecycle-thread) Could not connect to Cassandra cluster
- assuming its not up yet:
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.1:9042
(com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot
connect))
plus the stack trace.
So given that this happens during every HK startup, could we not
classify it as normal and change it to INFO without the stack trace?
I am ready to prepare a PR unless somebody raises a hand against that.
Thanks,
Peter
9 years, 2 months
IFTTT alert notifications
by Jiri Kremser
Hello,
if you don't know the ifttt, it's simple site where you can define a trigger condition and then some action if the condition is met. It's a closed source software (no need to pay though), but the number of possible actions is enormous. I'd say it's industrial standard for this kind of problems. They provide a way to trigger the action manually by sending http POST to their api.
It's described here:
https://ifttt.com/maker
The post request looks like this:
`curl -X POST -H "Content-Type: application/json" -d '{"value1":"1","value2":"2","value3":"foo"}' https://maker.ifttt.com/trigger/sdf/with/key/aabbccddeeffgghh`
where sdf is the name of the event (must be defined via their website), aabbcc.. is the secret token for the user and valueN:n in the json is the arbitrary data you can then use inside the actions. For instance in the subject of the email or whatever the action allows.
So, if we add a new alert notification that can do such a post request we can get for free all these actions:
https://ifttt.com/recipes/do
Again, the if-condition-then-action rule must be defined via their website (if condition is called 'maker' in this case), the actions use the oauth so it asks for permission if needed. For instance I use it for pushbullet notifications so It asked pushbulet "auth api"
wdyt?
jk
9 years, 2 months
Parent POM and Wildfly BOM
by Thomas Segismont
Hi everyone,
I've been working on the changes needed in Metrics for the parent POM
upgrade to version 16 (those introducing Wildfly 9).
There are three things I noticed which I believe are worth sharing.
Firstly beware that the Wildfly guys have changed their philosophy about
the BOM: now they force the "provided" scope in the BOM and exclude all
the dependencies they think you shouldn't care about as a EE7
application developer.
On one hand it frees you from adding the provided scope declaration in
your application POM. On the other hand, if you use one of the artifacts
in tests then dependency resolution could suddenly be broken.
Secondly our parent POM does not only declare the Wildfly POM in
dependency management section, in also imports it. Which means that all
our projects get forced versioning and scope, even if they are not
Wildfly based.
Thirdly, minor issue, the Wildfly Maven plugin does not configure a
default Wildfly version which means that we are all forced to declare in
components parent POMs. Like this in Metrics:
https://github.com/hawkular/hawkular-metrics/blob/master/pom.xml#L190-L196
Going forward, I propose that we no longer "import" the BOM in Hawkular
parent, and let components do it where needed. And that we declare the
Wildfly version to start with the Wildfly Maven plugin in the parent.
Regards,
Thomas
9 years, 2 months
Availability revisited
by Heiko W.Rupp
Hey,
we did talk about Availability and computed state in the past
Now triggered by https://issues.jboss.org/browse/HAWKULAR-401
and also https://issues.jboss.org/browse/HAWKULAR-407
we need to revisit this and finally start including it in the code base.
In -407 we have the issue that the server can currently not detect that
a feed is down. For the WF-agent, this is likely to be solved with the
new
feed-comm system, that can see disconnect messages [1] and act
accordingly
(i.E. server side add a synthetic "down" event into the availability
data stream.
Of course other feeds can also use that mechanism.
A generic feed though, that is sending availability records from time to
time
is most probably not sending a "down" event in the case that it is going
down or crashing. So we need to have a periodic job looking for feeds
that did not talk to us for a longer period of time.
This also implies that at least the in-memory state for feed
availability
needs to be updated with a last-seen record, as Micke described some
time
ago ( that last seen record should probably be flushed to C* from time
to
time).
Also we would need to require "generic" feeds to do some heartbeats by
sending their availability once per minute at least.
Now for -401, which is trickier. If e.g. a WildFly is in state
'reload-needed',
it is technically up, but its configuration has pending changes.
So we would need "up" availability, and then another (sub) state
indicating
the pending change.
And then we may have state like "maintenance mode", where a resource
may be up or down without impacting e.g. alerting or any SLA
computation.
From those raw input variables we would then compute the resource
state
http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html
While this could be up/down/unknown/(mixed for groups), it will also
mean
that we need to convey the other information to the user. If e.g. a
resource
is in maintenance mode, the user should be informed why alerts on the
resource do not fire.
Likewise for reload-needed: the user needs to know why the recent
changes
he or she made did not change the way the appserver works.
Treating reload-needed as just "down" is wrong, as the server continues
to
work and serve requests.
The above of course has an impact on storage. Right now we only store
up/down/unknown (as text) for availability, but we certainly would need
to also store sub-state.
For the maintenance-mode, this is orthogonal to all the above and should
probably a "flag" on a graph of resources.
Heiko
[1] @OnClose is called with a code of 1006 on client crash/abnormal
termination.
See http://tools.ietf.org/html/rfc6455#section-7.4
9 years, 2 months
[Metrics] Unified query support brainstorming
by Thomas Segismont
Hi everyone,
Right now, when you query data in Metrics, you can, given a time range:
- get raw data
- get raw data having some tags
- get a "bucketed" view of raw data with on-the fly aggregates
- get periods for which a condition on the value holds true
But you can't mix these capabilities.
For example, you can't ask for periods where the avg of a gauge,
computed over 1 min buckets, is greater than a threshold.
It's not possible either to express AND/OR in tags or condition queries.
Searching by tags is useful, but users have to tag data manually (unless
the collector adds some predefined tags). It would be useful to be able
to operate on data coming from different metrics having similar names:
like tell me the average cpu usage over all my web hosts, provided the
metrics are _webhost*.cpu.usage_
When aggregating data, users should able to provide the name of the
function, be it a builtin or user defined function.
Last but not least, when rollups will be implemented, users should
probably not have to care about where the data is stored,if they ask for
the 1-month average of a metric over the past year, or the 30-seconds
average over the past five minutes (think about the UI zooming graphs).
The examples are not completely fabricated, it's feedback I got while
presenting Hawkular. I understood Stefan heard similar comments during
Summit.
I'm starting this thread to gather inputs so that we can build a
powerful, unified query API. Feel free to provide more use cases and/or
ideas for implementation.
Thanks!
Thomas
9 years, 2 months
glue code - where does it go?
by John Mazzitelli
We have a fairly annoying problem with our current glue code location.
Right now, we have the "glue code" that is known as "feed-comm" inside the hawkular repo - there is specifically the "feed-comm-api" artifact that contains some Java code that is required for use by Java-based feeds in order to talk to the server (also includes some JSON schemas for non-Java feeds to use).
But since this is in hawkular repo, my agent needs a hawkular release in order to pull in the feed-comm-api artifact! Well, obviously, that's what we are moving toward for next week - a alpha hawkualr release. But until that happens, I can't release the agent.
But, interestingly enough, because hawkular (kettle) contains the agent, hawkular can't be released until it has an agent release to pull in.
Round and round we go.
I really hate saying we need yet another repo, but how else can we get glue code that is common between server and feeds/agents that isn't dependent upon a hawkular release being made?
9 years, 2 months
Release cadence
by Heiko W.Rupp
Hey
I have observed that our current Hawkular cadence of 4 weeks
with similar cadences of components makes us end up with
long living integration branches and a larger rush near the
end to integrate them, get them for the first time tested in CI
and even for the first time tested in real world.
In one of the last releases there was a changed implementation
in one component, that basically turned out as a no-op and
still returned a "200 OK" code, so clients thought everything is
happy, but it was not. We found the issue (through ppl looking
at the UI) and solved it, but it was in a rush.
This certainly goes against all the ideas of "release early, release
often", "cut small slices", "changes go into CI/CD and go live quickly".
Remember the coin flipping ?
Ideally we would always be able to integrate changes from
components into Hawkular (main), but I understand that with the
way maven and its release process to central works, it is also not
ideal to release many versions per day.
With all of the above in mind, I propose that we move to a
"at least once per week" model, where we do a component release
at least once per week(*), which then in the four week stream form
a new Alpha release. The smaller releases do not need release notes,
I don't care if we use the micro number or a .AlphaY designator on
them, but they should be a release, that is not a (named) snapshot.
This will allow us to still have less efforts to do releases, but
keep being (more) agile and have earlier integrations and thus
less long living integration branches.
On top of that, we need to provide new and/or changed apis(**)
early on in the 4 weeks cadence so that other components can
already start calling them, even if they are not yet functionally
complete.
*) Of course only if a change to the component has been made.
**) Ideally with changed apis, we keep the old version around for
a bit and offer the new version on top. Remember, that especially
with non-compiletime bindings, we can not know which client is
at what api version.
9 years, 2 months
"Generated UI" ?
by Heiko W.Rupp
hey,
for upcoming tasks like
"define and manage jdbc driver"
"define and manage data source"
and others
I wonder if we could speed up the UI development a bit by
creating some skeletons for html , and also ts-code
by running some scripts over the resource type
definitions we have.
This is not meant to be the 100% UI, but rather a start where
the UI folks can improve on, but which could give us a (working)
head start.
Does that make any sense?
Heiko
9 years, 2 months