[infinispan-dev] Parallel M/R

Mircea Markus mmarkus at redhat.com
Fri Dec 13 05:06:06 EST 2013


Sent from my iPhone

>> On 9 Dec 2013, at 16:45, Radim Vansa <rvansa at redhat.com> wrote:
>> 
>> On 12/09/2013 04:21 PM, Vladimir Blagojevic wrote:
>> Radim, these are some very good ideas. And I think we should put them on
>> the roadmap.
> Do you have any JIRA where this could be marked down?
>> 
>> Also, I like your ExecutorAllCompletionService, however, I think it will
>> not work in this case as we often do not have exclusive access to the
>> underlying executor service used in ExecutorAllCompletionService. There
>> might be some other Infinispan part using the same executor service
>> which will screw up the logic of incrementing and decrementing tasks. It
>> seems like you need to do counting logic based on some id or smth similar.
> You're right - I am not really sure which threadpools does M/R use. And 
> in fact it might happen that currently multiple executions using the 
> persistence executor would interfere. I should fix the service to add 
> the ID and multiplex internally! Created [1] for that.
> Shouldn't there be separate threadpools for map and reduce tasks? I 
> mean, two threadpools, because used mapper pool should not block another 
> reduce tasks to be executed (otherwise there might be a deadlock between 
> the threadpools and bounded queue in collector).

For the mapper at least (i think reducer as well) we're planning to rely on the Fork/Join fwk. This doesn't allow configuring external pool, which is not necessarely a bad thing, considering the large number of pool configs we provide to the user.

> 
> Radim
> 
> [1] https://issues.jboss.org/browse/ISPN-3800
>> 
>>> On 12/9/2013, 3:09 AM, Radim Vansa wrote:
>>> There is one thing I really don't like about the current implementation:
>>> DefaultCollector. And any other collection that keeps one (or more)
>>> object per entry.
>>> We can't assume that if you double the number of objects in memory (and
>>> in fact, if you map entry to bigger object, you do that), they'd still
>>> fit into it. Moreover, if you map the objects from cache store as well.
>>> I believe we have to use Collector implemented as bounded queue, and
>>> start reduction phase on the entries that have been mapped in parallel
>>> to the mapper phase. Otherwise, say hello to OOME.
>>> 
>>> Cheers
>>> 
>>> Radim
>>> 
>>> PS: And don't keep all the futures just to check that all tasks have
>>> been finished - use ExecutorAllCompletionService instead.
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> -- 
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list