Hi all,
I know this has been discussed in the past (by Tristan I think), but I don’t know how
concrete the plans have come since then.
One major issue with all the distributed execution code interfaces we have is that it
requires to have in the classpath of each node both the implementation of these interfaces
and the class files corresponding to the key and value being processed. My understanding
is that this is true of the distexec, Map / Reduce and (clustered) listener.
Evangelos from the LEADS project sort of worked around this problem by creating
specialized versions of his distexec that loads the necessary JARs from the grid itself
(in a set of keys) and creates a classloader that references these JARs. In a sequence, it
conceptually looks like that:
- have the generic classloader distexec version in the each of grid nodes classpath at
start time
- when a new remote execution is required, load each necessary JAR in a specific key in a
specific cache
- the generic distexec basically receives the necessary keys, load each jar and create a
classloader out of them
- the generic distexec load and launch the specific code that needs to be executed (based
on the fqcn of the code to execute) from the created classloader
There are a few problems with that including:
- it requires a lot of manual work from the user
- big JARs make the key / value per JAR logic explode a bit. The algorithms LEADS use have
300 MB sized JARs
- god know what security leak this can lead to
So I wondered if we have a better alternative and plans and if there was a wiki page
discussing the needs and potential approaches.
As an intermediary step we could make this approach a tutorial or side classes that people
can borrow from for each of the use cases.
Emmanuel