Availability revisited
by Heiko W.Rupp
Hey,
we did talk about Availability and computed state in the past
Now triggered by https://issues.jboss.org/browse/HAWKULAR-401
and also https://issues.jboss.org/browse/HAWKULAR-407
we need to revisit this and finally start including it in the code base.
In -407 we have the issue that the server can currently not detect that
a feed is down. For the WF-agent, this is likely to be solved with the
new
feed-comm system, that can see disconnect messages [1] and act
accordingly
(i.E. server …
[View More]side add a synthetic "down" event into the availability
data stream.
Of course other feeds can also use that mechanism.
A generic feed though, that is sending availability records from time to
time
is most probably not sending a "down" event in the case that it is going
down or crashing. So we need to have a periodic job looking for feeds
that did not talk to us for a longer period of time.
This also implies that at least the in-memory state for feed
availability
needs to be updated with a last-seen record, as Micke described some
time
ago ( that last seen record should probably be flushed to C* from time
to
time).
Also we would need to require "generic" feeds to do some heartbeats by
sending their availability once per minute at least.
Now for -401, which is trickier. If e.g. a WildFly is in state
'reload-needed',
it is technically up, but its configuration has pending changes.
So we would need "up" availability, and then another (sub) state
indicating
the pending change.
And then we may have state like "maintenance mode", where a resource
may be up or down without impacting e.g. alerting or any SLA
computation.
From those raw input variables we would then compute the resource
state
http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html
While this could be up/down/unknown/(mixed for groups), it will also
mean
that we need to convey the other information to the user. If e.g. a
resource
is in maintenance mode, the user should be informed why alerts on the
resource do not fire.
Likewise for reload-needed: the user needs to know why the recent
changes
he or she made did not change the way the appserver works.
Treating reload-needed as just "down" is wrong, as the server continues
to
work and serve requests.
The above of course has an impact on storage. Right now we only store
up/down/unknown (as text) for availability, but we certainly would need
to also store sub-state.
For the maintenance-mode, this is orthogonal to all the above and should
probably a "flag" on a graph of resources.
Heiko
[1] @OnClose is called with a code of 1006 on client crash/abnormal
termination.
See http://tools.ietf.org/html/rfc6455#section-7.4
[View Less]
9 years, 8 months
[Metrics] Unified query support brainstorming
by Thomas Segismont
Hi everyone,
Right now, when you query data in Metrics, you can, given a time range:
- get raw data
- get raw data having some tags
- get a "bucketed" view of raw data with on-the fly aggregates
- get periods for which a condition on the value holds true
But you can't mix these capabilities.
For example, you can't ask for periods where the avg of a gauge,
computed over 1 min buckets, is greater than a threshold.
It's not possible either to express AND/OR in tags or condition queries.
…
[View More]Searching by tags is useful, but users have to tag data manually (unless
the collector adds some predefined tags). It would be useful to be able
to operate on data coming from different metrics having similar names:
like tell me the average cpu usage over all my web hosts, provided the
metrics are _webhost*.cpu.usage_
When aggregating data, users should able to provide the name of the
function, be it a builtin or user defined function.
Last but not least, when rollups will be implemented, users should
probably not have to care about where the data is stored,if they ask for
the 1-month average of a metric over the past year, or the 30-seconds
average over the past five minutes (think about the UI zooming graphs).
The examples are not completely fabricated, it's feedback I got while
presenting Hawkular. I understood Stefan heard similar comments during
Summit.
I'm starting this thread to gather inputs so that we can build a
powerful, unified query API. Feel free to provide more use cases and/or
ideas for implementation.
Thanks!
Thomas
[View Less]
9 years, 8 months
glue code - where does it go?
by John Mazzitelli
We have a fairly annoying problem with our current glue code location.
Right now, we have the "glue code" that is known as "feed-comm" inside the hawkular repo - there is specifically the "feed-comm-api" artifact that contains some Java code that is required for use by Java-based feeds in order to talk to the server (also includes some JSON schemas for non-Java feeds to use).
But since this is in hawkular repo, my agent needs a hawkular release in order to pull in the feed-comm-api artifact! …
[View More]Well, obviously, that's what we are moving toward for next week - a alpha hawkualr release. But until that happens, I can't release the agent.
But, interestingly enough, because hawkular (kettle) contains the agent, hawkular can't be released until it has an agent release to pull in.
Round and round we go.
I really hate saying we need yet another repo, but how else can we get glue code that is common between server and feeds/agents that isn't dependent upon a hawkular release being made?
[View Less]
9 years, 8 months
Release cadence
by Heiko W.Rupp
Hey
I have observed that our current Hawkular cadence of 4 weeks
with similar cadences of components makes us end up with
long living integration branches and a larger rush near the
end to integrate them, get them for the first time tested in CI
and even for the first time tested in real world.
In one of the last releases there was a changed implementation
in one component, that basically turned out as a no-op and
still returned a "200 OK" code, so clients thought everything is
happy, but …
[View More]it was not. We found the issue (through ppl looking
at the UI) and solved it, but it was in a rush.
This certainly goes against all the ideas of "release early, release
often", "cut small slices", "changes go into CI/CD and go live quickly".
Remember the coin flipping ?
Ideally we would always be able to integrate changes from
components into Hawkular (main), but I understand that with the
way maven and its release process to central works, it is also not
ideal to release many versions per day.
With all of the above in mind, I propose that we move to a
"at least once per week" model, where we do a component release
at least once per week(*), which then in the four week stream form
a new Alpha release. The smaller releases do not need release notes,
I don't care if we use the micro number or a .AlphaY designator on
them, but they should be a release, that is not a (named) snapshot.
This will allow us to still have less efforts to do releases, but
keep being (more) agile and have earlier integrations and thus
less long living integration branches.
On top of that, we need to provide new and/or changed apis(**)
early on in the 4 weeks cadence so that other components can
already start calling them, even if they are not yet functionally
complete.
*) Of course only if a change to the component has been made.
**) Ideally with changed apis, we keep the old version around for
a bit and offer the new version on top. Remember, that especially
with non-compiletime bindings, we can not know which client is
at what api version.
[View Less]
9 years, 8 months
"Generated UI" ?
by Heiko W.Rupp
hey,
for upcoming tasks like
"define and manage jdbc driver"
"define and manage data source"
and others
I wonder if we could speed up the UI development a bit by
creating some skeletons for html , and also ts-code
by running some scripts over the resource type
definitions we have.
This is not meant to be the 100% UI, but rather a start where
the UI folks can improve on, but which could give us a (working)
head start.
Does that make any sense?
Heiko
9 years, 8 months
[Metrics] Pluggable aggregation functions: next steps
by Thomas Segismont
Hi,
I have looked at Aakarsh's repo:
https://github.com/Akki5/hawkular_plugin/
It's a good start with an interface describing a doubles to double
function, a classloader for implementation loading and a set of initial
implementations.
In order to integrate this work into Metrics, I think we should follow
the following steps:
=====
#1 Change the contract
Doubles to double works great for avg/min/max/... functions on gauge
metrics. But we need to consider other metric types.
Also, the …
[View More]interface should not only accept data point values, but whole
data points. Because some functions need the timestamp to compute the
result. % of up availability is a good example.
And functions may return different types: Double, Long, AvailabilityType.
#2 Update configuration options to let the user set a plugins directory
Metrics doc needs will have to be updated.
#3 Create a function repository for each metric type
We can build on JDK's service loader + Aakarsh's classloader implementation.
#4 Add builtin aggregate functions
Extract existing Metrics code (min, max, avg, % of up avail, downtime
duration) into builtin functions.
#5 Document the process of implementing a pluggable function
We need to think about function naming as well. Should we use a prefix
to identify a builtin function?
=====
I will start another thread to discuss REST and Core API data query changes.
Thoughts?
Thanks,
Thomas
[View Less]
9 years, 8 months
Hawkular BTM 0.2.0.Final released
by Gary Brown
Hi all
I'm pleased to announce the release of version 0.2.0.Final of Hawkular BTM. The main focus for this release has been on enhancing the business transaction collection capabilities.
A quick demo of this version, showing monitoring of two Vertx applications, can be found here: https://youtu.be/TtAXiYhqTSk
Highlights of this release:
* URI inclusion/exclusion support, allowing business transactions to be filtered based on initial URIs of interest.
* Propagate business …
[View More]transaction name, identified based on inclusion URI, through subsequent fragments for the same business transaction instance.
* Child node suppression - provide a mechanism for ignoring child nodes where they add no value. The specific case that prompted this mechanism was when instrumenting JDBC prepared statements.
* Provide mechanism for capturing header values from different message types, for use where a simple map is not available
* Define instrumentation rules for Vertx (HTTP and EventBus).
* Administration REST service, responsible for providing the collector configuration. This means that the configuration no longer needs to be defined in the client (execution) environment.
* Batch reporting of business transactions to the server.
* Configuration switch to determine if only named business transactions should be reported. Default is false, to enable discovery of business transaction (fragments) available from the execution environment(s) being monitored, but when in a production environment, we would only want the fragments of interest to be reported.
* Instrumentation rule versioning mechanism. This will enable rules that are only applicable up until a certain version of a technology to be superseded by newer versions of the rule.
The release can be found here: https://github.com/hawkular/hawkular-btm/releases/tag/0.2.0.Final
The detailed release notes can be found here: https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12316120&versi...
Feature requests and bugs should be reported in our project jira: https://issues.jboss.org/browse/HWKBTM
Regards
Gary
[View Less]
9 years, 8 months
metric data type
by John Mazzitelli
OK, folks... how do we solve the following?
There are now two independent enums to define metric data type - one in inventory and one in metrics.
org.hawkular.inventory.api.model.MetricDataType
org.hawkular.metrics.client.common.MetricType
>From an agent or feed perspective, I now have to decide which one I want. Pretty annoying, but OK I can translate between the two if I need to (if the agent is talking to inventory, it will use their enum; if talking to metrics, use their enum). In …
[View More]the agent configuration, <metric-dmr> will need to use the common values between the two in order to support both. But this leads to a more difficult problem to come to grips with - inventory and metric enums for metric type have different values!
Inventory has GAUGE, AVAILABILITY, COUNTER, COUNTER_RATE.
Metrics API has GAUGE, COUNTER, TIMING, SIMPLE.
Right now, the wildfly agent only supports gauge and counter (and inventory availability).
[View Less]
9 years, 8 months
Inventory 0.2.0.Alpha1 released
by Peter Palaga
Hi *,
Most important changes:
* Titan graph DB backend instead of Tinkergraph
* "Canonical path" which is a path going down from tenant do the
entity in question following the "contains" relationships. Canonical
paths are guaranteed to be unique per inventory installation.
Full list of resolved Jiras:
https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12315923&versi...
We hope that Inventory 0.2.0.Alpha1 reaches Hawkular master soon. The
changes needed in Hawkular …
[View More]main are ready in dev/inventory-0.2.0.Alpha1
branch and the ones for Agent should be accepted soon
https://github.com/hawkular/hawkular-agent/pull/23 The last unsolved
blocker is that the relevant Agent branch depends on Metrics
0.5.0-SNAPSHOT and we have no info when it is going to be released.
It was me this time, who released Inventory, just for the sake of making
sure that the Bus Factor for performing Inventory releases is higher than 1.
Best,
Peter
[View Less]
9 years, 8 months