Re: [infinispan-dev] Distributed execution Framework
by Bowers, Ronald (Civ, ARL/SLAD)
Hi Vladimir, Malik, and Jeff,
I too have been working on a Rio-based distributed execution framework and
have also done some work deploying Infinispan under Rio. This project
(Gomez) is available at
https://rio-substrates.dev.java.net/svn/rio-substrates/branches/maven-gomez.
Although not quite production ready, it is mostly stable and reliable.
Gomez is used as the basis of a distributed simulation system. Our
simulation problem is easily parallelizable and use the F/J pattern
extensively.
The Gomez distribute execution framework consists of three types of
services:
1. Worker Service - executes tasks
2. Tasks Space - a JavaSpace that is used to distribute tasks and
events
3. Job Monitor - watches the progress of a job and acts on certain
events.
The collection of these services is referred to a Sim Pool. Multiple worker
services and task spaces can be deployed to nodes in the cluster. There is
only one Job Monitor per Sim Pool.
Distributed F/J in implemented in Gomez in the following fashion. An initial
task is dropped into a Task Space by the client. This task is taken from the
space by a Worker, who executes the task. The task can fork into multiple
subordinate tasks. These subordinate tasks are then written to the task
space to be executed by any worker in the sim pool. The initial (parent)
task then generates a join task. The join task is attached to a message that
is sent through the task space to the Job Monitor. The message tells the Job
Monitor the number of subordinate tasks and an identifier by which they can
be recognized. This identifier is necessary because the worker can be
expanding and distributing multiple F/J hierarchies concurrently. After
writing the message to the JobMonitor, the parent task returns.
Results from tasks are written to a separate distributed cache. Currently it
is also implemented using JavaSpaces.
The Job Monitor listens for the completion of the subordinate tasks. When
all subordinate tasks have completed, the Job Monitor submits the "join"
task. This task gathers the results from the subordinate tasks and performs
whatever actions (usually statistics) are required. When the join task has
completed it also notifies the Job Monitor and the simulation can continue
to roll back up.
In my case I have only started investigating Infinispan in the role as a
distributed cache for results. I have not considered it for task
distribution since I couldn't see a clear way of mapping the JavaSpace
take() pattern to the Cache API.
-Ron