[Hawkular-dev] Low-impact clients and never dropping any events

Tue Feb 17 08:59:45 EST 2015

> On Feb 16, 2015, at 4:03 AM, Thomas Segismont <tsegismo at redhat.com> wrote:
> 
> Le 13/02/2015 18:43, Randall Hauch a écrit :
>> How will the work be split up across the cluster? Are the incoming
>> messages stored before they’re processed?
>> 
> 
> An Hawkular metrics server is web server (Wildfly) in front of a Cassandra cluster.
> 
> So you'd distribute the work with a load balancer.
> 
> I'm not sure I understand your second question.
> 

My question is simply this: is the raw incoming data persisted before any other computations or analyses are done? The answer dictates whether Hawkular has the potential to lose data in a crash. The only reason I bring this up is that monitoring (like data storage) is one of those use cases where you can’t really afford to drop or lose any data, since for monitoring that lost data might be the critical evidence you need to understand why something bad happened to your monitored system.

Let’s say that a collector sends some data that will ultimately be processed by Hawkular, including updating various windows (e.g., minutely, hourly, daily, etc.), before persisting. If Hawkular crashes before that data is completely processed, then that data will be lost forever. If Hawkular performs some of the work (e.g., just updating the minutely window) before crashing, then the data appears in only some of the generated/derived information. BTW, in this traditional approach, using a transaction will simply ensure that all of your derived/generated data is consistent; it doesn’t ensure that your data is processed.

If, however, Hawkular were to persist the collected data *before* processing anything, then as long as the incoming data were persisted then if Hawkular crashed and was restarted it might be able to just continue where it left off, without losing any data or computations. There are lots of ways of ensuring this, but it’s not likely they will all perform well.

The stream-oriented architectures make it possible to not lose any data, because every computation or bit of processing is done in services/jobs that read from and write to (usually persistent) streams. The architecture can ensure at-least-once (and sometimes an exactly-once) semantics for every bit of data passing thru each stream. Of course, this is done without transactions and often with partitioned streams to allow things to scale quite well and handle very large volumes of data.

>>> 
>>> Most collectors buffer data to be sent and operate in separate threads.
>>> So if the metrics ingestion rate decreases, they'll consume more memory.
>>> Other than that, it should have limited impact on your service.
>> 
>> Sounds good.
>> 
>>> 
>>>> 
>>>> Also, how do you plan to ensure that, no matter what happens to the
>>>> Hawkular system or anything it depends upon, no client information is
>>>> every lost or dropped?
>>> 
>>> Usually collectors will drop data once buffers are full. If you want to
>>> make sure no data is lost, then you need to build a custom sender.
>>> Hawkular metrics has an HTTP interface so the response code should tell
>>> you if a metric was successfully persisted.
>> 
>> So I understand that the collectors will drop any buffered data, since
>> that will be unsent if the monitored system (or external collectors)
>> crash. But what happens if Hawkular suffers a partial or total crash?
>> What data is lost? What happens to data that was in-process when the
>> crash occurred? Stream-based architectures are interesting because they
>> often can handle more load, partition it more effectively, and are more
>> durable.
> 
> If you lose Wildfly servers or Cassandra nodes then the rest of cluster should continue to work.
> 
> As a Hawkular metrics client, you can't say what happened to your data if you lose the connection before you get a response with a status code. When a Wildfly server crashes, your data may be still being unmarshaled, or might have reached the Cassandra driver, or have been written to the Cassandra logs.
> 
> We have no plan ATM to implement log-based event processors ourselves. In the beginning, we'll focus on existing and wide-spread collectors (generally buffer-based) and in-house collectors like the wildfly-monitor.
> 
>> 
>>> 
>>>> 
>>>> Finally, is the plan to make Hawkular embeddable (excluding the stuff
>>>> that has to be embedded in monitored clients/systems/services), or
>>>> only a separate turn-key (i.e., install-and-run-and-use) system?
>>> 
>>> Hawkular metrics comes in two forms:
>>> * a Java library (metrics-core)
>>> * a Java EE web application (built on top of the library)
>>> 
>>> metrics-core can be embedded in any sort of JVM application but it
>>> expects to find a Cassandra cluster somewhere.
>>> 
>>> 
>>> 
>>> I hope it helps. Feel free to ask for details.
>>> 
>>> And welcome to Hawkular!
>>> 
>>> Thomas
>>> _______________________________________________
>>> hawkular-dev mailing list
>>> hawkular-dev at lists.jboss.org <mailto:hawkular-dev at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> 
>