[Hawkular-dev] Low-impact clients and never dropping any events

Thu Feb 19 03:51:46 EST 2015

On 02/18/2015 04:17 PM, Randall Hauch wrote:
>
>> On Feb 18, 2015, at 6:50 AM, Heiko Braun <ike.braun at googlemail.com 
>> <mailto:ike.braun at googlemail.com>> wrote:
>>
>>
>>> On 18 Feb 2015, at 13:43, John Sanda <jsanda at redhat.com 
>>> <mailto:jsanda at redhat.com>> wrote:
>>>
>>> I think that Spark's streaming API, particularly the window 
>>> operations, could be an effective way to do computations in real 
>>> time as data as ingested
>>
>> +1
>>
>> not only for processing the streams, but also for any kind of post 
>> processing needed. plus it would supply the abstractions to run 
>> computations across large number of nodes.
>>
>
> Exactly. Use Spark Streaming or even Storm would increase the 
> installation and operational complexity
We have a good experience of users being turned down by installation and 
operational complexity, so no matter the "but" part (even though that 
sounds interesting), we would need to find a proper solution to remove 
it / reduce it.

It needs to be easy to install in small environments and able to scale 
when needed/wanted. Scaling by adding homogeneous nodes would help.

I have no experience with Spark/Storm, what is the burden on 
installation and operational complexity ?

Thomas

> but it would give you a lot of really easy management of your 
> workflow. Each of the stream processors consume a stream and do one 
> thing (e.g., aggregate the last 60 seconds of raw data, or aggregate 
> the last 60 minutes of either the raw data or the 60-second windows, 
> etc.). The different outputs can still get written to whatever storage 
> you want; stream-oriented processing just changes *how* you process 
> the incoming data.
>
> An alternative approach is to use Apache Kafka directly. There are 
> pros and cons, but the benefit is that the services that do the 
> computations would be microservices (no, really - just really small, 
> single-threaded processes that do a single thing) that could be easily 
> deployed across the cluster. If anyone is interested this approach, 
> ping me and I’ll point you to a prototype that does this (not for 
> analytics).
>
> BTW, a stream-processing approach does not limit you to live data. In 
> fact, quite the opposite. Many people use stream processing for 
> ingesting large volumes of live data, but lots of other people use it 
> in “big data” as an alternative to batch processing (often map-reduce).
>
>
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20150219/6e871e48/attachment.html