Hi All,
it will interested to give cep capabilities to infinispan caches,
I have some comments,
On 17 March 2014 13:00, Jonathan Halliday <jonathan.halliday(a)redhat.com>wrote:
Alongside recent talk of integrating infinispan with hadoop batch
processing, there has been some discussion of using the data grid
alongside an event stream processing system.
There are several directions we could consider here. In approximate
order of increasing complexity these are:
- Allow bi-directional flow of events, such that listeners on the cache
can be used to cause events in the processing engine, or events in the
processing engine can update the cache.
To catch events from cache, I propose to develop a simple infinispanSource
for flume (
http://flume.apache.org ),
using this infinispanSource, one can listen any cache for updates or
inserts and redirect this events to either a cep engine or other
destination.
Updating cache will be similar, we may have a infinispanSink for flume and
if any application that needs to update any cache via sending events, he
can use infinispanSink in its application.
Actually, developing such flume components we will have a change data
capture tool (
http://en.wikipedia.org/wiki/Change_data_capture ) for
infinispan. CDC tools are vital for complex event processing integrations
and I think this will be a good starting point.
- Allow the cache to be used to hold lookup data for reference from user
code running the processing engine, to speed up joining streamed events
to what would otherwise be data tables on disk.
Actually it is important to cache some rdms table into memory in such
systems and
sync this cache periodically from rdms table to be up-to-date.
I think this requirement can be achived via infinispan's "cache loader"s .
- Integrate with the processing engine itself, such that infinispan
can
be used to store items that would otherwise occupy precious RAM. This
one is probably only viable with the cooperation of the stream
processing system, so I'll base further discussion on Drools Fusion.
The engine uses memory for a) rules, i.e. processing logic. Some of this
is infrequently accessed. Think of a decision tree in which some
branches are traversed more than others. So, opportunities to swap bits
out to cache perhaps. b) state, particularly sliding windows. Again
some data is infrequently accessed. For many sliding window calculations
in particular (e.g. running average), only the head and tail of the
window are actually used. The events in-between can be swapped out.
Holding states are the most important case. In this requirement off-heap
cache will be a must. ( Ben Coton is implementing Peter Lawrey's hugemaps
into infinispan for off-heap cache you may know )
Of course these integrations require the stream processing engine to be
written to support such operations - careful handling of object
references is needed. Currently the engine doesn't work that way -
everything is focussed on speed at the expense of memory.
- Borrow some ideas from the event processing DSLs, such that the data
grid query engine can independently support continuous (standing)
queries rather than just one-off queries. Arguably this is reinventing
the wheel, but for simple use cases it may be preferable to run the
stream processing logic directly in the grid rather than deploying a
dedicated event stream processing system.
I think it's probably going to
> require supporting lists as a first class construct alongside maps
> though. There are various cludges possible here, including the brute
> force approach of faking continuous query by re-executing a one-off
> query on each mutation, but they tend to be inefficient. There is also
> the thorny problem of supporting a (potentially distributed) clock,
> since a lot of use cases need to reference the passage of time in the
> query e.g. 'send event to listener if avg in last N minutes > x'.
>
> regards
Yavuz Gökırmak
-
tr.linkedin.com/pub/yavuz-gokirmak/20/a11/23b/
Jonathan Halliday
Core developer, JBoss.
--
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (USA), Paul Hickey (Ireland), Matt Parson
(USA), Charlie Peters (USA)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev