<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 2/25/2014, 7:33 AM, Dan Berindei

      wrote:<br>

    </div>

    <blockquote

cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Do we really need special support for distributed tasks

              to write results to another cache? We already allow a task

              to do<br>

              <br>

              cache.getCacheManager().getCache("outputCache").put(k, v)<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Yeah, very good point Dan. Thanks for being sanity check. Mircea? <br>

    <br>

    <blockquote

cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>&nbsp;</div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div class="">&gt;<br>

                &gt;&gt; I was thinking we add a concept of<br>

                &gt;&gt; DistributedTaskExecutionListener which can be

                specified in<br>

                &gt;&gt; DistributedTaskBuilder:<br>

                &gt;&gt;<br>

                &gt;&gt; DistributedTaskBuilder&lt;T&gt;<br>

                &gt;&gt;

                executionListener(DistributedTaskExecutionListener&lt;K,

                T&gt; listener);<br>

                &gt;&gt;<br>

                &gt;&gt;<br>

                &gt;&gt; We needed DistributedTaskExecutionListener

                anyway. All distributed tasks<br>

                &gt;&gt; might use some feedback about task progress,

                completion/failure and on.<br>

                &gt;&gt; My proposal is roughly:<br>

                &gt;&gt;<br>

                &gt;&gt;<br>

                &gt;&gt; public interface

                DistributedTaskExecutionListener&lt;K, T&gt; {<br>

                &gt;&gt;<br>

                &gt;&gt; &nbsp; &nbsp; void subtaskSent(Address node, Set&lt;K&gt;

                inputKeys);<br>

                &gt;&gt; &nbsp; &nbsp; void subtaskFailed(Address node,

                Set&lt;K&gt; inputKeys, Exception e);<br>

                &gt;&gt; &nbsp; &nbsp; void subtaskSucceded(Address node,

                Set&lt;K&gt; inputKeys, T result);<br>

                &gt;&gt; &nbsp; &nbsp; void allSubtasksCompleted();<br>

                &gt;&gt;<br>

                &gt;&gt; }<br>

                &gt;&gt;<br>

                &gt;&gt; So much for that.<br>

                &gt; I think this it would make sense to add this logic

                for monitoring, + additional info such as average

                execution time etc. I'm not sure if this is a generally

                useful API though, unless there were people asking for

                it already?<br>

              </div>

              Ok, noted. If you remember any references about this let

              me know and<br>

              I'll incorporate what people actually asked for rather

              than guess.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Ok, let's wait until we get some actual requests from

              users then. TBH I don't think distributed tasks with

              subtasks are something that users care about. E.g. with

              Map/Reduce the reduce tasks are not subtasks of the

              map/combine tasks, so this API wouldn't help.<br>

              <br>

            </div>

            <div>Hadoop has a Reporter interface that allows you to

              report "ticks" and increment counters, maybe we should add

              something like that instead?<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The subtask I am referring to here is just to denote part of the

    distributed task initiated using dist.executors. This interface

    (maybe extended a bit with ideas from Reporter) could be used for

    both monitoring and more application specific logic about task

    re-execution and so on.<br>

    <br>

    <br>

    <blockquote

cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>&nbsp;<br>

            </div>

            <div><br>

            </div>

            <div>I think we should allow each distributed task to deal

              with output in its own way, the existing API should be

              enough.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Yes, I can see your point. Mircea?<br>

    <blockquote

cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>&nbsp;</div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div>

                <div class="h5">

                  <br>

                  &gt;&gt; public interface

                  MapReduceTaskExecutionListener {<br>

                  &gt;&gt;<br>

                  &gt;&gt; &nbsp; &nbsp; void mapTaskInitialized(Address

                  executionAddress);<br>

                  &gt;&gt; &nbsp; &nbsp; void mapTaskSucceeded(Address

                  executionAddress);<br>

                  &gt;&gt; &nbsp; &nbsp; void mapTaskFailed(Address

                  executionTarget, Exception cause);<br>

                  &gt;&gt; &nbsp; &nbsp; void mapPhaseCompleted();<br>

                  &gt;&gt;<br>

                  &gt;&gt; &nbsp; &nbsp; void reduceTaskInitialized(Address

                  executionAddress);<br>

                  &gt;&gt; &nbsp; &nbsp; void reduceTaskSucceeded(Address

                  executionAddress);<br>

                  &gt;&gt; &nbsp; &nbsp; void reduceTaskFailed(Address address,

                  Exception cause);<br>

                  &gt;&gt; &nbsp; &nbsp; void reducePhaseCompleted();<br>

                  &gt;&gt;<br>

                  &gt;&gt; }<br>

                  &gt; IMO - in the first stage at leas - I would rather

                  use a simpler (Notifying)Future, on which the user can

                  wait till the computation happens: it's simpler and

                  more aligned with the rest of our async API.<br>

                  &gt;<br>

                </div>

              </div>

              What do you mean? We already have futures in MapReduceTask

              API. This API<br>

              is more fine grained and allows monitoring/reporting of

              task progress.<br>

              Please clarify.<br>

            </blockquote>

            <div><br>

            </div>

            <div>I'm not sure about the usefulness of an API like this

              either... if the intention is to allow the user to collect

              statistics about duration of various phases, then I think

              exposing the durations via MapReduceTasks would be better.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    How would you design that API Dan? Something other than

    listener/callback interface?<br>

    <br>

    <blockquote

cite="mid:CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>

            </div>

            <div>&nbsp;</div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div class=""><br>

                &gt;&gt; while MapReduceTask would have an additional

                method:<br>

                &gt;&gt;<br>

                &gt;&gt; public void execute(Cache&lt;KOut, VOut&gt;

                resultsCache);<br>

                &gt; you could overload it with cache name only method.<br>

              </div>

              Yeah, good idea. Same for usingIntermediateCache? I

              actually asked you<br>

              this here <a moz-do-not-send="true"

                href="https://issues.jboss.org/browse/ISPN-4021"

                target="_blank">https://issues.jboss.org/browse/ISPN-4021</a><br>

            </blockquote>

            <div><br>

            </div>

            <div>+1 to allow a cache name only. For the intermediate

              cache I don't think it makes sense to allow a Cache

              version at all.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Ok good. Deal.<br>

    <br>

    <br>

    Thanks,<br>

    Vladimir<br>

  </body>

</html>