<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 2/25/2014, 7:33 AM, Dan Berindei
      wrote:<br>
    </div>
    <blockquote
cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Do we really need special support for distributed tasks
              to write results to another cache? We already allow a task
              to do<br>
              <br>
              cache.getCacheManager().getCache("outputCache").put(k, v)<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Yeah, very good point Dan. Thanks for being sanity check. Mircea? <br>
    <br>
    <blockquote
cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div class="">&gt;<br>
                &gt;&gt; I was thinking we add a concept of<br>
                &gt;&gt; DistributedTaskExecutionListener which can be
                specified in<br>
                &gt;&gt; DistributedTaskBuilder:<br>
                &gt;&gt;<br>
                &gt;&gt; DistributedTaskBuilder&lt;T&gt;<br>
                &gt;&gt;
                executionListener(DistributedTaskExecutionListener&lt;K,
                T&gt; listener);<br>
                &gt;&gt;<br>
                &gt;&gt;<br>
                &gt;&gt; We needed DistributedTaskExecutionListener
                anyway. All distributed tasks<br>
                &gt;&gt; might use some feedback about task progress,
                completion/failure and on.<br>
                &gt;&gt; My proposal is roughly:<br>
                &gt;&gt;<br>
                &gt;&gt;<br>
                &gt;&gt; public interface
                DistributedTaskExecutionListener&lt;K, T&gt; {<br>
                &gt;&gt;<br>
                &gt;&gt; &nbsp; &nbsp; void subtaskSent(Address node, Set&lt;K&gt;
                inputKeys);<br>
                &gt;&gt; &nbsp; &nbsp; void subtaskFailed(Address node,
                Set&lt;K&gt; inputKeys, Exception e);<br>
                &gt;&gt; &nbsp; &nbsp; void subtaskSucceded(Address node,
                Set&lt;K&gt; inputKeys, T result);<br>
                &gt;&gt; &nbsp; &nbsp; void allSubtasksCompleted();<br>
                &gt;&gt;<br>
                &gt;&gt; }<br>
                &gt;&gt;<br>
                &gt;&gt; So much for that.<br>
                &gt; I think this it would make sense to add this logic
                for monitoring, + additional info such as average
                execution time etc. I'm not sure if this is a generally
                useful API though, unless there were people asking for
                it already?<br>
              </div>
              Ok, noted. If you remember any references about this let
              me know and<br>
              I'll incorporate what people actually asked for rather
              than guess.<br>
            </blockquote>
            <div><br>
            </div>
            <div>Ok, let's wait until we get some actual requests from
              users then. TBH I don't think distributed tasks with
              subtasks are something that users care about. E.g. with
              Map/Reduce the reduce tasks are not subtasks of the
              map/combine tasks, so this API wouldn't help.<br>
              <br>
            </div>
            <div>Hadoop has a Reporter interface that allows you to
              report "ticks" and increment counters, maybe we should add
              something like that instead?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    The subtask I am referring to here is just to denote part of the
    distributed task initiated using dist.executors. This interface
    (maybe extended a bit with ideas from Reporter) could be used for
    both monitoring and more application specific logic about task
    re-execution and so on.<br>
    <br>
    <br>
    <blockquote
cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>&nbsp;<br>
            </div>
            <div><br>
            </div>
            <div>I think we should allow each distributed task to deal
              with output in its own way, the existing API should be
              enough.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes, I can see your point. Mircea?<br>
    <blockquote
cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <div class="h5">
                  <br>
                  &gt;&gt; public interface
                  MapReduceTaskExecutionListener {<br>
                  &gt;&gt;<br>
                  &gt;&gt; &nbsp; &nbsp; void mapTaskInitialized(Address
                  executionAddress);<br>
                  &gt;&gt; &nbsp; &nbsp; void mapTaskSucceeded(Address
                  executionAddress);<br>
                  &gt;&gt; &nbsp; &nbsp; void mapTaskFailed(Address
                  executionTarget, Exception cause);<br>
                  &gt;&gt; &nbsp; &nbsp; void mapPhaseCompleted();<br>
                  &gt;&gt;<br>
                  &gt;&gt; &nbsp; &nbsp; void reduceTaskInitialized(Address
                  executionAddress);<br>
                  &gt;&gt; &nbsp; &nbsp; void reduceTaskSucceeded(Address
                  executionAddress);<br>
                  &gt;&gt; &nbsp; &nbsp; void reduceTaskFailed(Address address,
                  Exception cause);<br>
                  &gt;&gt; &nbsp; &nbsp; void reducePhaseCompleted();<br>
                  &gt;&gt;<br>
                  &gt;&gt; }<br>
                  &gt; IMO - in the first stage at leas - I would rather
                  use a simpler (Notifying)Future, on which the user can
                  wait till the computation happens: it's simpler and
                  more aligned with the rest of our async API.<br>
                  &gt;<br>
                </div>
              </div>
              What do you mean? We already have futures in MapReduceTask
              API. This API<br>
              is more fine grained and allows monitoring/reporting of
              task progress.<br>
              Please clarify.<br>
            </blockquote>
            <div><br>
            </div>
            <div>I'm not sure about the usefulness of an API like this
              either... if the intention is to allow the user to collect
              statistics about duration of various phases, then I think
              exposing the durations via MapReduceTasks would be better.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    How would you design that API Dan? Something other than
    listener/callback interface?<br>
    <br>
    <blockquote
cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>
            </div>
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div class=""><br>
                &gt;&gt; while MapReduceTask would have an additional
                method:<br>
                &gt;&gt;<br>
                &gt;&gt; public void execute(Cache&lt;KOut, VOut&gt;
                resultsCache);<br>
                &gt; you could overload it with cache name only method.<br>
              </div>
              Yeah, good idea. Same for usingIntermediateCache? I
              actually asked you<br>
              this here <a moz-do-not-send="true"
                href="https://issues.jboss.org/browse/ISPN-4021"
                target="_blank">https://issues.jboss.org/browse/ISPN-4021</a><br>
            </blockquote>
            <div><br>
            </div>
            <div>+1 to allow a cache name only. For the intermediate
              cache I don't think it makes sense to allow a Cache
              version at all.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Ok good. Deal.<br>
    <br>
    <br>
    Thanks,<br>
    Vladimir<br>
  </body>
</html>