Hey Mircea,
On the right track but not exactly. I do not use separate thread per
key. Input keys for map/combine/reduce are split into a List of Lists
and then each lists is submitted to Executor as a separate Runnable.
Have a look at submitToExecutor method on
https://github.com/vblagoje/infinispan/tree/t_2284_new
What I intend to do is move this code to DataContainer because the
minimal overhead of directly iterating keys inside container and
iterating in parallel should gives a further edge.
Stay tuned,
Vladimir
On 12/6/2013, 11:18 AM, Mircea Markus wrote:
Thanks Vladimir, I like the hands on approach!
Adding -dev, there's a lot of interest around the parallel M/R so I think others will
have some thoughts on it as well.
So what you're basically doing in your branch is iterate over all the keys in the
cache and then for each key invoke the mapping in a separate thread. Whilst this would
work, I think it has some drawbacks:
- the iteration over the keys in the container happens in sequence, albeit the mapping
phases happening in parallel. This speeds things up a bit but not as much as having the
iteration
happening in parallel, especially when the mapper is fast, which I think it's pretty
common.
- the StatelessTask + some smaller objects are being created for each iterated key.
That's a lot of noise for the GC imo
I think delegating the parallel iteration to the DataContainer (similar to
AdvancedCacheLoader.process (Executor)) would be a better approach IMO:
- the logic is reusable for other components as well, such as querying (to implement
full-scan-like search, or a general purpose parallel iterator over the keys
- object creation is reduced
- the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding the entries,
which already supports parallel iteration so the heavy lifting is already in place
On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblagoje(a)redhat.com> wrote:
> Here is my M/R parallel execution solution updated to master
https://github.com/vblagoje/infinispan/tree/t_2284_new
>
> Now, I'll work on your solution which I am starting to like actually the more I
think about it. Although I have to admit that I would eviscerate some of your interfaces
like these KeyFilters into more prominent packages so we can all use the same interfaces.
Also I would see if we can genericize some of your interfaces and implementations.
>
> Will keep you updated.
>
> Vladimir
Cheers,