[infinispan-dev] Parallel M/R

Mircea Markus mmarkus at redhat.com
Fri Dec 6 11:40:39 EST 2013


On Dec 6, 2013, at 4:36 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:

> Hey Mircea,
> 
> On the right track but not exactly. I do not use separate thread per 
> key. Input keys for map/combine/reduce are split into a List of Lists 
> and then each lists is submitted to Executor as a separate Runnable. 
> Have a look at submitToExecutor method on 
> https://github.com/vblagoje/infinispan/tree/t_2284_new

Ah right. Still for each key an StatelessTask instance is created though.

> 
> What I intend to do is move this code to DataContainer because the 
> minimal overhead of directly iterating keys inside container and 
> iterating in parallel should gives a further edge.

Hmm I think you could leverage the parallel iteration from the EquivalentConcurrentHashMapV8 there instead of writing it yourself ;)

> 
> Stay tuned,
> Vladimir
> 
> On 12/6/2013, 11:18 AM, Mircea Markus wrote:
>> Thanks Vladimir, I like the hands on approach!
>> Adding -dev, there's a lot of interest around the parallel M/R so I think others will have some thoughts on it as well.
>> 
>> So what you're basically doing in your branch is iterate over all the keys in the cache and then for each key invoke the mapping in a separate thread. Whilst this would work, I think it has some drawbacks:
>> - the iteration over the keys in the container happens in sequence, albeit the mapping phases happening in parallel. This speeds things up a bit but not as much as having the iteration
>> happening in parallel, especially when the mapper is fast, which I think it's pretty common.
>> - the StatelessTask + some smaller objects are being created for each iterated key. That's a lot of noise for the GC imo
>> 
>> I think delegating the parallel iteration to the DataContainer (similar to AdvancedCacheLoader.process (Executor)) would be a better approach IMO:
>> - the logic is reusable for other components as well, such as querying (to implement full-scan-like search, or a general purpose parallel iterator over the keys
>> - object creation is reduced
>> - the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding the entries, which already supports parallel iteration so the heavy lifting is already in place
>> 
>> On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>> 
>>> Here is my M/R parallel execution solution updated to master https://github.com/vblagoje/infinispan/tree/t_2284_new
>>> 
>>> Now, I'll work on your solution which I am starting to like actually the more I think about it. Although I have to admit that I would eviscerate some of your interfaces like these KeyFilters into more prominent packages so we can all use the same interfaces. Also I would see if we can genericize some of your interfaces and implementations.
>>> 
>>> Will keep you updated.
>>> 
>>> Vladimir
>> Cheers,
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)







More information about the infinispan-dev mailing list