On Fri, Oct 10, 2014 at 6:13 PM, Vladimir Blagojevic <vblagoje@redhat.com> wrote:
On 2014-10-10, 3:03 AM, Dan Berindei wrote:
>
> The problem is that the intermediate keys aren't in the same segment:
> we want the reduce phase to access only keys local to the reducing
> node, and keys in different input segments can yield values for the
> same intermediate key. So like you say, we'd have to retry on every
> topology change in the intermediary cache, not just the ones affecting
> segment _i_.
>
If we have to retry for all segments on every topology change than I am
not sure why would it make sense to work on this optimization and
topology handling mechanism at all. We have to handle the cases where
one node might have completed map phase and inserted deltas, while the
other only started inserting deltas, and the third one is still doing
map phase and has not inserted any deltas at all. The same thing with
reduce portion. It seems to me that in the end any algorithm we come up
with will not be not much better than: detect topology change, retry
map/reduce job.

Initially that was my thinking as well. But if the originator invokes the map/combine phase for only one segment at a time, it will have to retry only one segment per cluster node, not all the segments. And each node would write to separate keys in the intermediate cache, making it easy to clean up only one node's work. So it would still be worth it, as usually numSegments >> clusterSize. 

Plus we don't need this broad retry strategy if the intermediate cache is transactional (I think).

The biggest downside I see is that it would be horribly slow if the cache store doesn't support efficient iteration of a single segment. So we might want to implement a full retry strategy as well, if some cache stores can't support that.

Cheers
Dan