On 10 Oct 2014, at 18:06, Dan Berindei <dan.berindei@gmail.com> wrote:


The biggest downside I see is that it would be horribly slow if the cache store doesn't support efficient iteration of a single segment. So we might want to implement a full retry strategy as well, if some cache stores can't support that.


My understanding from a discussion with Pedro (in a hard, cold and sinister place but that’s another story) is that *today* M/R is kinda horrible for global cache stores anyways that have to do the key per node filtering dance anyways. So it’s not significantly worse.
Plus I said we should do work per segment but in reality if you send 5 Map segment work to the same node, you can optimize and do a single loop only making it feel like they are separated work.