to move inventory forward, we need to port it to Tinkerpop3 - a new(ish) and
actively maintained version of the Tinkerpop graph API.
Apart from the huge improvement in the API expressiveness and capabilities,
the important thing is that it comes with a variety of backends, 2 of which
are of particular interest to us ATM. The Titan backend (with Titan in version
1.0) and SQL backend (using the sqlg library).
The SQL backend is a much improved (yet still unfinished in terms of
optimizations and some corner case features) version of the toy SQL backend
Back in March I ran performance comparisons for SQL/postgres and Titan (0.5.4)
on Tinkerpop2 and concluded that Titan was the best choice then.
After completing a simplistic port of inventory to Tinkerpop3 (not taking
advantage of any new features or opportunities to simplify inventory
codebase), I've run the performance tests again for the 2 new backends - Titan
1.0 and Sqlg (on postgres).
This time the results are not so clear as the last time.
>From the charts  you can see that Postgres is actually quite a bit faster
on reads and can better handle concurrent read access while Titan shines in
writes (arguably thanks to Cassandra as its storage).
Of course, I can imagine that the read performance advantage of Postgres would
decrease with the growing amount of data stored (the tests ran with the
inventory size of ~10k entities) but I am quite positive we'd get competitive
read performance from both solutions up to the sizes of inventory we
anticipate (100k-1M entities).
Now the question is whether the insert performance is something we should be
worried about in Postgres too much. IMHO, there should be some room for
improvement in Sqlg and also our move to /sync for agent synchronization would
make this less of a problem (because there would be not that many initial
imports that would create vast amounts of entities).
Nevertheless I currently cannot say who is the "winner" here. Each backend has
its pros and cons:
- high write throughput
- backed by cassandra
- slower reads
- project virtually dead
- complex codebase (self-made fixes unlikely)
- small codebase
- everybody knows SQL
- faster reads
- faster concurrent reads
- slow writes
- another backend needed (Postgres)
Therefore my intention here is to go forward with a "proper" port to
Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if we
can do anything with the write performance.
IMHO, any choice we make is "workable" as it is even today but we need to
weigh in the productization requirements. For those Sqlg with its small dep
footprint and postgres backend seems preferable to the huge dependency mess of
I'm looking for the owner of the OpenShift namespace 'hawkular'. To
check if you own it, open the following URL and check the "Domains"
part. In my case, I see only "jpkroehling".
[ CC to Federico as he may have some ideas from the Kube/OS side ]
Our QE has opened an interesting case:
where I first thought WTF with that title.
But then when reading further it got more interesting.
Basically what happens is that especially in environments like
individual containers/appservers are Kettle and not Pets: one goes down,
killed, you start a new one somewhere else.
Now the interesting question for us are (first purely on the Hawkular
- how can we detect that such a container is down and will never come up
with that id again (-> we need to clean it up in inventory)
- can we learn that for a killed container A, a freshly started
container A' is
the replacement to e.g. continue with performance monitoring of the app
or to re-associate relationships with other items in inventory-
(Is that even something we want - again that is Kettle and not Pets
- Could eap+embedded agent perhaps store some token in Kube which
is then passed when A' is started so that A' knows it is the new A (e.g.
- I guess that would not make much sense anyway, as for an app with
three app servers all would get that same token.
Perhaps we should ignore that use case for now completely and tackle
that differently in the sense that we don't care about 'real' app
but rather introduce the concept of a 'virtual' server where we only
via Kube that it exists and how many of them for a certain application
(which is identified via some tag in Kube). Those virtual servers
data, but we don't really try to do anything with them 'personally',
but indirectly via Kube interactions (i.e. map the incoming data to the
app and not to an individual server). We would also not store
the individual server in inventory, so there is no need to clean it
up (again, no pet but kettle).
In fact we could just use the feed-id as kube token (or vice versa).
We still need a way to detect that one of those kettle-as is on Kube
and possibly either disable to re-route some of the lifecycle events
onto Kubernetes (start in any case, stop probably does not matter
if he container dies because the appserver inside stops or if kube
just kills it).
Reg. Adresse: Red Hat GmbH, Technopark II, Haus C,
Werner-von-Siemens-Ring 14, D-85630 Grasbrunn
Handelsregister: Amtsgericht München HRB 153243
Geschäftsführer: Charles Cachera, Michael Cunningham, Michael O'Neill,
Hawkular APM is currently built as a separate distribution independent from other Hawkular components. However in the near future we will want to explore integration with other components, such as Alerts, Metrics and Inventory.
Therefore I wanted to explore the options we have for building an integrated environment, to provide the basis for such integration work, without impacting the more immediate plans for Hawkular Services.
The two possible approaches are:
1) Provide a maven profile as part of the Hawkular Services build, that will include the APM server. The UI could be deployed separately as a war, or possibly integrated into the UI build?
2) As suggested by Juca, the APM distribution could be built upon the hawkular-services distribution.
There are pros/cons with both approaches:
My preference is option (1) as it moves us closer to a fully integrated hawkular-services solution, but relies on a separate build using the profile (not sure if that would result in a separate release distribution).
Option 2 would provide the full distribution as a release, but the downside is the size of the distribution (and its dependencies, such as cassandra), when user only interested in APM. Unclear whether a standalone APM distribution will still be required in the future - at present the website is structured to support this.
tl;dr: This probably only concerns Mazz and Austin :)
The subject is a little bit cryptic, so let me explain - this deals with
inventory sync and what to consider a change that is worth being synced on an
Today whether an entity is update during sync depends on whether some of this
"vital" or rather "identifying" properties change. Namely:
Feed: only ID and the hashes of child entities are considered
ResourceType: only ID and hashes of configs and child operation types are
MetricType: id + data type + unit
OperationType: id + hashes of contained configs (return type and param types)
Resource: id + hashes of contained metrics, contained resources, config and
>From the above, one can see that not all changes to an entity will result in
the change being synchronized during the /sync call, because for example an
addition of a new generic property to a metric doesn't make its identity hash
I start to think this is not precisely what we want to happen during the /sync
On one hand, I think it is good that we still can claim 2 resources being
identical, because their "structure" is the same, regardless of what the
generic properties on them look like (because anyone can add arbitrary
properties to them). This enables us to do the ../identical/.. magic in
On the other hand the recent discussion about attaching an h-metric ID as a
generic property to a metric iff it differs from its id/path in inventory got
me thinking. In the current set up, if agent reported that it changed the h-
metric ID for some metric, the change would not be persisted, because /sync
would see the metric as the same (because changing a generic property doesn't
change the identity hash of the metric).
I can see 3 solutions to this:
* formalize the h-metric ID in some kind of dedicated structure in inventory
that would contribute to the identity hash (i.e. similar to the "also-known-
as" map I proposed in the thread about h-metric ID)
* change the way we compute the identity hash and make it consider everything
on an entity to contribute (I'm not sure I like this since it would limit the
usefulness of ../identical/.. traversals).
* compute 2 hashes - 1 for tracking the identity (i.e. the 1 we have today)
and a second one for tracking changes in content (i.e. one that would consider
Fortunately, none of the above is a huge change. The scaffolding is all there
so any of the approaches would amount to only a couple of days work.
I was doubtful if we gonna provide apk of android client of hawkular
through Google play store or user will have to compile for themselves. I
assume the case to be of play store.
And if that is the case then clients can not use their own google account
to setup push notifications for alerts as configration file is needed to be
I suggest that hawkular can provide one instance of firebase account for it
and all the hawkular servers will use the same.
With the workflow I suggest, there will not remain the need of setting up
unified push server to provide notification.
- With any user creation on any hawkular serve, there will be created a
32 Byte ID that we can assume to be unique.
- Any client that sign in to that user will retrieve that string and
will register to that as topic subscription.
- When ever a new alert is created. It will fire a HTTP request to
Firebase with unique id as toopic and Server key provided by hawkular
- Rest work of manupulating the recieved alert will be handled on client
Please write your views on this.
Hi, I was trying to create about 6 resource types at the same time.
But one of the response is:
"errorMsg" : "Local lock contention"
Is there a limitation of the number of resourceTypes which can be created
at a time?
I am happy to announce the release of Hawkular Inventory 0.17.3.Final.
There are now new features but a couple of important bugfixes:
* transactions are now properly handled during /hawkular/inventory/traversal.
It should no longer happen that a traversal would return stale data.
* [HAWKULAR-1099] transaction retries actually work with Titan now (Titan
closes a transaction on failure while Inventory tried to rollback a failed
transaction, leading Titan to complain about trying to work with a closed
* update to /traversal queries to handle /traversal/recursive (which would
give you all the entities in inventory, so don't do that ;) )
Lately there has been some discussion on the AOS scalability lists for our storage usage when used in Openshift. While we can scale, the issue is that some customers do not wish to allocate large amounts of storage for storing metrics, as I assume they view metrics and monitoring as secondary functions (now that's whole another discussion..)
To the numbers, they're predicting that at maximum scale, Hawkular-Metrics would use close to ~4TB of disk for one week of data. This is clearly too much, and we don't deploy any other compression methods currently than LZ4, which according to my tests is quite bad for our data model. So I created a small prototype that reads our current data model, compresses it and stores it to a new data model (and verifies that the returned data equals to sent data). For testing I used a ~55MB extract from the MiQ instance that QE was running. One caveat of course here, the QE instance is not in heavy usage. For following results, I decided to remove COUNTER type of data, as they looked to be "0" in most cases and compression would basically get rid of all of them, giving too rosy picture.
When storing to our current data model, the disk space taken by "data" table was 74MB. My prototype uses the method of Facebook's Gorilla paper (same as what for example Prometheus uses), and in this test I used a one day block size (storing one metric's one day data to one row inside Cassandra). The end result was 3,1MB of storage space used. Code can be found from bitbucket.org/burmanm/compress_proto (Golang). I know Prometheus advertises estimated 1.3 bytes per timestamped value, but those numbers require certain sort of test data that does not represent anything I have (the compression scheme's efficiency depends on the timestamp delta and value deltas and delta-deltas). The prototype lacks certain features, for example I want it to encode compression type to the first 1 byte of the header for each row - so we could add more compression types in the future for different workloads - and availabilities would probably have better compression if we changed the disk presentation to something bit based.
** Read performance
John brought up the first question - now that we store large amount of datapoints in a single row, what happens to our performance when we want to read only some parts of the data?
- We need to read rows we don't need and then discard those
+ We reduce the amount of rows read from the Cassandra (less overhead for driver & server)
+ Reduced disk usage means we'll store more of the data in memory caches
How does this affect the end result? I'll skip the last part of the advantage in my testing now and make sure all the reads for both scenarios are happening from the in-memory SSTables or at least disk cache (the testing machine has enough memory to keep everything in memory). For this scenario I stored 1024 datapoints for a single metric, storing them inside one block of data, thus trying to maximize the impact of unnecessary reads. I'm only interested in the first 360 datapoints.
In the scenario, our current method requests 360 rows from Cassandra and then processes them. In the compressed mode, we request 1 row (which has 1024 stored metrics) and then filter out those we don't need in the client. Results:
BenchmarkCompressedPartialReadSpeed-4 275371 ns/op
BenchmarkUncompressedPartialReadSpeed-4 1303088 ns/op
As we can see, filtering on the HWKMETRICS side yields quite a large speedup instead of letting Cassandra to read so many rows (all of the rows were from the same partition in this test).
** Storing data
Next, lets address some issues we're going to face because of the distributed nature of our solution. We have two issues compared to Prometheus for example (I use it as an example as it was used by one Openshift PM) - we let data to arrive out-of-order and we must deal with distributed nature of our data storage. We are also stricter when it comes to syncing to the storage, while Prometheus allows some data to be lost in between the writes. I can get back to optimization targets later.
For storing the data, to be able to apply this sort of compression to it, we would need to always know the previous stored value. To be able to do this, we would need to do read-write path to the Cassandra and this is exactly one of the weaknesses of Cassandra's design (in performance and consistency). Clearly we need to overcome this issue somehow, while still keeping those properties that let us have our advantages.
** First phase of integration
For the first phase, I would propose that we keep our current data model for short term storage. We would store the data here as it arrives and then later rewrite it to the compressed scheme in different table. For reads we would request data from the both tables and merge the results. This should not be visible to the users at all and it's a simple approach to the issue. A job framework such as the one John develops currently is required.
There are some open questions to this, and I hope some of you have some great ideas I didn't think. Please read the optimization part also if I happened to mention your idea as some future path.
- How often do we process the data and do we restrict the out-of-order capabilities to certain timeslice? If we would use something like 4 hour blocks as default, should we start compressing rows after one hour of block closing? While we can technically reopen the row and reindex the whole block, it does not make sense to do this too often. If we decide to go with the reindexing scenario, in that case we could start writing the next block before it closes (like every 15 minutes we would re-encode the currently open blocks if they have new incoming data). We have to be careful here as to not overwhelm our processing power and Cassandra's. This is a tradeoff between minimum disk space usage or minimum CPU/memory usage.
- Compression block size changes. User could configure this - increasing it on the fly is no problem for reads, but reducing is slightly more complex scenario). If user increases the size of the block, our query would just pick some extra rows that are instantly discarded, but nothing would break. However, decreasing the size would confuse our Cassandra reads unless we know the time of the block size change and adjust queries accordingly for times before this event and after.
** Optimizing the solution
The following optimizations would increase the performance of Hawkular-Metrics ingestion rate a lot and as such are probably worth investigation at some point. But they're also complex and I would want to refrain from implementing them in the first phase so that we could get compression quicker to the product - so thta we would not miss certain deadlines.
- Stop writing to the Cassandra in the first phase. Instead we write to something more ephemeral, such as mmap backed memory cache that is distributed among the Hawkular nodes. It would also need some sort of processing locality (direct the write to the node that controls the hash of the metricId for example - sort of like HBase does), unless we want to employ locks to prevent ordering issues if we encode already in the memory. From memory we would then store blocks to the permanent Cassandra store. The clients need to be token/hash-method aware to send data to the correct node.
Benefits for that solution is increased write speed as we such backend easily reaches a million writes per second and the only bottleneck would be our JSON parsing performance. Reads could be served from both storages without much overhead. This optimization would be worth it even without the compression layer, but I would say this is not our most urgent issue (but if the write ingestion speed becomes an issue, this is the best solution to increasing and it's used in many Cassandra solutions, for time series I think SignalFX uses somewhat same approach, although they first write to Kafka).