[infinispan-dev] Parallel M/R

Radim Vansa rvansa at redhat.com
Mon Dec 9 11:38:53 EST 2013


On 12/09/2013 05:33 PM, Radim Vansa wrote:
> On 12/09/2013 04:21 PM, Vladimir Blagojevic wrote:
>> Radim, these are some very good ideas. And I think we should put them on
>> the roadmap.
> Do you have any JIRA where this could be marked down?
>> Also, I like your ExecutorAllCompletionService, however, I think it will
>> not work in this case as we often do not have exclusive access to the
>> underlying executor service used in ExecutorAllCompletionService. There
>> might be some other Infinispan part using the same executor service
>> which will screw up the logic of incrementing and decrementing tasks. It
>> seems like you need to do counting logic based on some id or smth similar.
> You're right - I am not really sure which threadpools does M/R use. And
> in fact it might happen that currently multiple executions using the
> persistence executor would interfere. I should fix the service to add
> the ID and multiplex internally! Created [1] for that.
Aah, I got confused... Only the executor is shared, the underlying 
ExecutorCompletionService (with the queue of futures) is private to the 
EACS, therefore, it's safe to share the Executor itself.

Radim

> Shouldn't there be separate threadpools for map and reduce tasks? I
> mean, two threadpools, because used mapper pool should not block another
> reduce tasks to be executed (otherwise there might be a deadlock between
> the threadpools and bounded queue in collector).
>
> Radim
>
> [1] https://issues.jboss.org/browse/ISPN-3800
>> On 12/9/2013, 3:09 AM, Radim Vansa wrote:
>>> There is one thing I really don't like about the current implementation:
>>> DefaultCollector. And any other collection that keeps one (or more)
>>> object per entry.
>>> We can't assume that if you double the number of objects in memory (and
>>> in fact, if you map entry to bigger object, you do that), they'd still
>>> fit into it. Moreover, if you map the objects from cache store as well.
>>> I believe we have to use Collector implemented as bounded queue, and
>>> start reduction phase on the entries that have been mapped in parallel
>>> to the mapper phase. Otherwise, say hello to OOME.
>>>
>>> Cheers
>>>
>>> Radim
>>>
>>> PS: And don't keep all the futures just to check that all tasks have
>>> been finished - use ExecutorAllCompletionService instead.
>>>
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA



More information about the infinispan-dev mailing list