[infinispan-dev] Architectural question about a system built on Infinispan

Thu May 26 15:38:25 EDT 2011

Hello all,

I'm new to Infinispan, sorry for any stupid questions. I'm evaluating it 
for a medium size processing platforms and would like to get some 
feedback about the feasibility of the architecture I've came up with 
after reading the documents which I found.

The system will have two components:
- a GUI component which displays (a subset of) data and generates commands
- a datastore / processing component which holds the data and changes it 
by reacting to the commands sent by the GUI

Important considerations are:
- high availability in the datastore tier
- low latency
- optimal data transfer from the data store to the GUI (ie. only deltas 
/ changed elements should be transferred)

My current ideas are the following:
- use a set of hotrod servers with DIST mode and the number of copies 
set to a value I would be comfortable with (I'm thinking 2 or 3 currently)
- use these servers to store both the current state and the commands 
(this works out nicely, since I need to keep the commands for later 
auditing)
- make hashing such that commands and objects on which the commands 
operate get to the same subset of servers
     Question: how can I control this? I don't want to control the 
specific node, but just to ensure that objects A and B get to the same 
subset of servers
- on each hotrod server add custom interceptors [1] which listen for the 
command objects and when one is intercepted modifies the corresponding 
object accordingly
- the GUI would write the commands to the correct HotRod servers trough 
topology aware clients
- the GUI would contain a local cache with a subset of objects. These 
objects would be synchronised with the HotRod servers (ie. when the 
objects change in the datastore tier / HotRod tier, the change is 
propagated to the GUI)
     Question: what is the best way to achieve this? (to synchronise a 
local cache with a subset of data from a set of HotRod servers). The 
only option I'm aware of currently are continious queries [4]
- inside the data tier there would be "supporting" information which is 
needed by nodes, but may not be necessary be in the local node (think 
for example configuration which can be update runtime, but also more 
dynamic information). From what I've read, the L1 cache feature [5] 
would be perfect for this, except for the fact that it uses invalidation 
when the data changes, rather than sending an update (ie. if the data 
changes, it is invalidated and the non-local nodes have to fetch it again)
     Question: is it possible to configure the L1 cache mechanism, such 
that the original node sends updates when the data changes rather than 
invalidations?

How optimal is the solution which I came up? How could it be improved? 
I've read about the Distributed Data Stream Processing Framework in 
Infinispan [3], but it seems to be more a one-off solution (ie. generate 
a report about all the existing objects at a given moment) rather than 
something which reacts to a new command as soon as it is written to the 
cache.

I'm looking to implement a data grid, where each node contains the data 
and the code to operate on the code. I will also be evaluating Hazelcast 
and GigaSpaces, but currently Infinispan seems to be the better 
alternative since it could be reused in multiple places in the 
architecture, making it easier to maintain and to understand. The JBoss 
Data Grid [2] also sounds interesting, but unfortunately it's not 
available yet.

Best regards,
Attila Balazs

[1] http://community.jboss.org/wiki/InfinispanCustomInterceptors
[2] http://www.jboss.com/edg6-early-access/
[3] 
http://community.jboss.org/wiki/DistributedDataStreamProcessingFrameworkInInfinispan
[4] http://community.jboss.org/wiki/ContinuousQueryWithInfinispan
[5] http://community.jboss.org/wiki/ClusteringModes#L1