[infinispan-dev] event processing integration

Jonathan Halliday jonathan.halliday at redhat.com
Mon Mar 17 07:00:32 EDT 2014


Alongside recent talk of integrating infinispan with hadoop batch 
processing, there has been some discussion of using the data grid 
alongside an event stream processing system.

There are several directions we could consider here. In approximate 
order of increasing complexity these are:

- Allow bi-directional flow of events, such that listeners on the cache 
can be used to cause events in the processing engine, or events in the 
processing engine can update the cache.

- Allow the cache to be used to hold lookup data for reference from user 
code running the processing engine, to speed up joining streamed events 
to what would otherwise be data tables on disk.

- Integrate with the processing engine itself, such that infinispan can 
be used to store items that would otherwise occupy precious RAM.  This 
one is probably only viable with the cooperation of the stream 
processing system, so I'll base further discussion on Drools Fusion.

The engine uses memory for a) rules, i.e. processing logic. Some of this 
is infrequently accessed. Think of a decision tree in which some 
branches are traversed more than others. So, opportunities to swap bits 
out to cache perhaps.  b) state, particularly sliding windows. Again 
some data is infrequently accessed. For many sliding window calculations 
in particular (e.g. running average), only the head and tail of the 
window are actually used. The events in-between can be swapped out.

Of course these integrations require the stream processing engine to be 
written to support such operations - careful handling of object 
references is needed. Currently the engine doesn't work that way - 
everything is focussed on speed at the expense of memory.

- Borrow some ideas from the event processing DSLs, such that the data 
grid query engine can independently support continuous (standing) 
queries rather than just one-off queries. Arguably this is reinventing 
the wheel, but for simple use cases it may be preferable to run the 
stream processing logic directly in the grid rather than deploying a 
dedicated event stream processing system. I think it's probably going to 
require supporting lists as a first class construct alongside maps 
though.   There are various cludges possible here, including the brute 
force approach of faking continuous query by re-executing a one-off 
query on each mutation, but they tend to be inefficient. There is also 
the thorny problem of supporting a (potentially distributed) clock, 
since a lot of use cases need to reference the passage of time in the 
query e.g. 'send event to listener if avg in last N minutes > x'.



Jonathan Halliday
Core developer, JBoss.

-- 
Registered in England and Wales under Company Registration No. 03798903 
Directors: Michael Cunningham (USA), Paul Hickey (Ireland), Matt Parson
(USA), Charlie Peters (USA)


More information about the infinispan-dev mailing list