[infinispan-dev] Parallel M/R

Vladimir Blagojevic vblagoje at redhat.com
Fri Dec 6 11:36:52 EST 2013


Hey Mircea,

On the right track but not exactly. I do not use separate thread per 
key. Input keys for map/combine/reduce are split into a List of Lists 
and then each lists is submitted to Executor as a separate Runnable. 
Have a look at submitToExecutor method on 
https://github.com/vblagoje/infinispan/tree/t_2284_new

What I intend to do is move this code to DataContainer because the 
minimal overhead of directly iterating keys inside container and 
iterating in parallel should gives a further edge.

Stay tuned,
Vladimir

On 12/6/2013, 11:18 AM, Mircea Markus wrote:
> Thanks Vladimir, I like the hands on approach!
> Adding -dev, there's a lot of interest around the parallel M/R so I think others will have some thoughts on it as well.
>
> So what you're basically doing in your branch is iterate over all the keys in the cache and then for each key invoke the mapping in a separate thread. Whilst this would work, I think it has some drawbacks:
> - the iteration over the keys in the container happens in sequence, albeit the mapping phases happening in parallel. This speeds things up a bit but not as much as having the iteration
> happening in parallel, especially when the mapper is fast, which I think it's pretty common.
> - the StatelessTask + some smaller objects are being created for each iterated key. That's a lot of noise for the GC imo
>
> I think delegating the parallel iteration to the DataContainer (similar to AdvancedCacheLoader.process (Executor)) would be a better approach IMO:
> - the logic is reusable for other components as well, such as querying (to implement full-scan-like search, or a general purpose parallel iterator over the keys
> - object creation is reduced
> - the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding the entries, which already supports parallel iteration so the heavy lifting is already in place
>
> On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>
>> Here is my M/R parallel execution solution updated to master https://github.com/vblagoje/infinispan/tree/t_2284_new
>>
>> Now, I'll work on your solution which I am starting to like actually the more I think about it. Although I have to admit that I would eviscerate some of your interfaces like these KeyFilters into more prominent packages so we can all use the same interfaces. Also I would see if we can genericize some of your interfaces and implementations.
>>
>> Will keep you updated.
>>
>> Vladimir
> Cheers,



More information about the infinispan-dev mailing list