[infinispan-dev] TopologySafe Map / Reduce

Dan Berindei dan.berindei at gmail.com
Fri Oct 10 12:06:53 EDT 2014


On Fri, Oct 10, 2014 at 6:13 PM, Vladimir Blagojevic <vblagoje at redhat.com>
wrote:

> On 2014-10-10, 3:03 AM, Dan Berindei wrote:
> >
> > The problem is that the intermediate keys aren't in the same segment:
> > we want the reduce phase to access only keys local to the reducing
> > node, and keys in different input segments can yield values for the
> > same intermediate key. So like you say, we'd have to retry on every
> > topology change in the intermediary cache, not just the ones affecting
> > segment _i_.
> >
> If we have to retry for all segments on every topology change than I am
> not sure why would it make sense to work on this optimization and
> topology handling mechanism at all. We have to handle the cases where
> one node might have completed map phase and inserted deltas, while the
> other only started inserting deltas, and the third one is still doing
> map phase and has not inserted any deltas at all. The same thing with
> reduce portion. It seems to me that in the end any algorithm we come up
> with will not be not much better than: detect topology change, retry
> map/reduce job.
>

Initially that was my thinking as well. But if the originator invokes the
map/combine phase for only one segment at a time, it will have to retry
only one segment per cluster node, not all the segments. And each node
would write to separate keys in the intermediate cache, making it easy to
clean up only one node's work. So it would still be worth it, as usually
numSegments >> clusterSize.

Plus we don't need this broad retry strategy if the intermediate cache is
transactional (I think).

The biggest downside I see is that it would be horribly slow if the cache
store doesn't support efficient iteration of a single segment. So we might
want to implement a full retry strategy as well, if some cache stores can't
support that.

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/4191fde3/attachment.html 


More information about the infinispan-dev mailing list