[Hawkular-dev] Low-impact clients and never dropping any events

Wed Feb 18 07:43:52 EST 2015

> On Feb 18, 2015, at 4:57 AM, Thomas Segismont <tsegismo at redhat.com> wrote:
> 
> Le 17/02/2015 14:59, Randall Hauch a écrit :
>>>> How will the work be split up across the cluster? Are the incoming
>>>>>> messages stored before they’re processed?
>>>>>> 
>>>> 
>>>> An Hawkular metrics server is web server (Wildfly) in front of a Cassandra cluster.
>>>> 
>>>> So you'd distribute the work with a load balancer.
>>>> 
>>>> I'm not sure I understand your second question.
>>>> 
>> My question is simply this: is the raw incoming data persisted before any other computations or analyses are done? The answer dictates whether Hawkular has the potential to lose data in a crash. The only reason I bring this up is that monitoring (like data storage) is one of those use cases where you can’t really afford to drop or lose any data, since for monitoring that lost data might be the critical evidence you need to understand why something bad happened to your monitored system.
>> 
>> Let’s say that a collector sends some data that will ultimately be processed by Hawkular, including updating various windows (e.g., minutely, hourly, daily, etc.), before persisting. If Hawkular crashes before that data is completely processed, then that data will be lost forever. If Hawkular performs some of the work (e.g., just updating the minutely window) before crashing, then the data appears in only some of the generated/derived information. BTW, in this traditional approach, using a transaction will simply ensure that all of your derived/generated data is consistent; it doesn’t ensure that your data is processed.
>> 
>> If, however, Hawkular were to persist the collected data*before*  processing anything, then as long as the incoming data were persisted then if Hawkular crashed and was restarted it might be able to just continue where it left off, without losing any data or computations. There are lots of ways of ensuring this, but it’s not likely they will all perform well.
>> 
>> The stream-oriented architectures make it possible to not lose any data, because every computation or bit of processing is done in services/jobs that read from and write to (usually persistent) streams. The architecture can ensure at-least-once (and sometimes an exactly-once) semantics for every bit of data passing thru each stream. Of course, this is done without transactions and often with partitioned streams to allow things to scale quite well and handle very large volumes of data.
>> 
>> 
> 
> Yes, Hawkular metrics persists data first. Computations happen later.

I think that Spark's streaming API, particularly the window operations, could be an effective way to do computations in real time as data as ingested. The problem though with introducing something like Spark or Storm is that it increases the operational complexity of hawkular itself which I think we want to avoid if possible.