On Dec 6, 2013, at 4:36 PM, Vladimir Blagojevic <vblagoje(a)redhat.com> wrote:
Hey Mircea,
On the right track but not exactly. I do not use separate thread per
key. Input keys for map/combine/reduce are split into a List of Lists
and then each lists is submitted to Executor as a separate Runnable.
Have a look at submitToExecutor method on
https://github.com/vblagoje/infinispan/tree/t_2284_new
Ah right. Still for each key an StatelessTask instance is created though.
What I intend to do is move this code to DataContainer because the
minimal overhead of directly iterating keys inside container and
iterating in parallel should gives a further edge.
Hmm I think you could leverage the parallel iteration from the
EquivalentConcurrentHashMapV8 there instead of writing it yourself ;)
Stay tuned,
Vladimir
On 12/6/2013, 11:18 AM, Mircea Markus wrote:
> Thanks Vladimir, I like the hands on approach!
> Adding -dev, there's a lot of interest around the parallel M/R so I think others
will have some thoughts on it as well.
>
> So what you're basically doing in your branch is iterate over all the keys in the
cache and then for each key invoke the mapping in a separate thread. Whilst this would
work, I think it has some drawbacks:
> - the iteration over the keys in the container happens in sequence, albeit the
mapping phases happening in parallel. This speeds things up a bit but not as much as
having the iteration
> happening in parallel, especially when the mapper is fast, which I think it's
pretty common.
> - the StatelessTask + some smaller objects are being created for each iterated key.
That's a lot of noise for the GC imo
>
> I think delegating the parallel iteration to the DataContainer (similar to
AdvancedCacheLoader.process (Executor)) would be a better approach IMO:
> - the logic is reusable for other components as well, such as querying (to implement
full-scan-like search, or a general purpose parallel iterator over the keys
> - object creation is reduced
> - the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding the
entries, which already supports parallel iteration so the heavy lifting is already in
place
>
> On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblagoje(a)redhat.com> wrote:
>
>> Here is my M/R parallel execution solution updated to master
https://github.com/vblagoje/infinispan/tree/t_2284_new
>>
>> Now, I'll work on your solution which I am starting to like actually the more
I think about it. Although I have to admit that I would eviscerate some of your interfaces
like these KeyFilters into more prominent packages so we can all use the same interfaces.
Also I would see if we can genericize some of your interfaces and implementations.
>>
>> Will keep you updated.
>>
>> Vladimir
> Cheers,
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)