<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 11/11/15 15:36, Stian Thorgersen
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAJgngAeLDWnyE=OZ5k+0uZT9gV+jNvs=r_2WA4acXQhCEMxF1g@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On 11 November 2015 at 15:23, Marek
            Posolda <span dir="ltr">&lt;<a moz-do-not-send="true"
                href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span class="">
                  <div>On 11/11/15 09:01, Stian Thorgersen wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On 10 November 2015 at
                          16:11, Marek Posolda <span dir="ltr">&lt;<a
                              moz-do-not-send="true"
                              href="mailto:mposolda@redhat.com"
                              target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:mposolda@redhat.com">mposolda@redhat.com</a></a>&gt;</span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"><span>
                                <div>On 09/11/15 14:09, Stian Thorgersen
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr"><br>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote">On 9
                                        November 2015 at 13:35,
                                        Sebastien Blanc <span dir="ltr">&lt;<a
                                            moz-do-not-send="true"
                                            href="mailto:sblanc@redhat.com"
                                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sblanc@redhat.com">sblanc@redhat.com</a></a>&gt;</span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          <div dir="ltr">
                                            <div>That would be really
                                              nice indeed ! <br>
                                            </div>
                                            But are the markers files
                                            not enough, instead of also
                                            having a table in the DB ?<br>
                                          </div>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>We need a way to prevent
                                          multiple nodes in a cluster to
                                          import the same file. For
                                          example on Kerberos you end up
                                          spinning up multiple instances
                                          of the same Docker image. <br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </span> I bet you meant 'Kubernetes' <span><span>
                                  :-) </span></span></div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Yup</div>
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"><span><span>
                                </span></span><br>
                              <br>
                              +1 for the improvements. Besides those I
                              think that earlier or later, we will need
                              to solve long-running export+import where
                              you want to import 100.000 users. <br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>+1</div>
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"> <br>
                              As I mentioned in another mail few weeks
                              ago, we can have:<br>
                              <br>
                              1) Table with the progress (51.000 users
                              already imported, around 49.000 remaining
                              etc.)<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>We would still need to split into
                            multiple files in either case. Having a
                            single json file with 100K users is probably
                            not going to perform very well. So what I
                            proposed would actually work for
                            long-running import as well. If each file
                            has a manageable amount of users (say ~5 min
                            to import) then each file will be marked as
                            imported or failed. At least for now I don't
                            think we should do smaller batches than one
                            file. As long as one file is imported within
                            the same TX then it's an all or nothing
                            import.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"> 2)
                              Concurrency and dividing the work among
                              cluster nodes (Node1 will import 50.000
                              users and node2 another 50.000 users)<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>This would be solved as well. Each node
                            picks up a file that's not processed yet.
                            Marks it in the DB and then gets to process
                            it.</div>
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"> 3)
                              Failover (Import won't be completely
                              broken if cluster node crashes after
                              import 90.000, but can continue on other
                              cluster nodes)<br>
                              <br>
                              I think the stuff I did recently for
                              pre-loading offline sessions at startup
                              could be reused for this stuff too and it
                              can handle (2) and (3) . Also it can
                              handle parallel import triggered from more
                              cluster nodes. <br>
                              <br>
                              For example: currently if you trigger
                              kubernetes with 2 cluster nodes, both
                              nodes will start to import same file at
                              the same time because import triggered by
                              node1 is not yet finished before node2 is
                              started, so there is not yet existing DB
                              record that file is already imported. With
                              the stuff I did, just the coordinator
                              (node1) will start the import . Node2 will
                              wait until import triggered by node1 is
                              finished, but at the same time it can
                              "help" to import some users (pages) if
                              coordinator asks him to do so. This impl
                              is based on infinispan distributed
                              executor service <a
                                moz-do-not-send="true"
href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework"
                                target="_blank"><a class="moz-txt-link-freetext" href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a></a>
                              .</div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>The DB record needs to be created before
                            a node tries to import it, including a
                            timestamp when it started the import. It
                            should then be updated once the import is
                            completed, with the result. Using the
                            distributed execution framework sounds like
                            a good idea though. How do you prevent
                            scheduling the same job multiple times? For
                            example if all nodes on startup scan the
                            import folder and simply import everything
                            they find, then there will be multiple of
                            the same job. Not really a big deal as the
                            first thing the job should do is check if
                            there's a record in the DB already.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> With distributed executor, it's the cluster
                coordinator, which coordinates which node would import
                what. It will send messages to cluster nodes like "Hey,
                please import the file testrealm-users-3.json with
                timestamp abcd123" . <br>
                <br>
                After node finishes the job, it notifies coordinator and
                coordinator will insert DB record and mark it as
                finished. So there is no DB record inserted before node
                starts import, because whole coordination is handled by
                the coordinator. Also there will never be same file
                imported more times by different cluster nodes. <br>
                <br>
                Only exception would be if cluster node crashes before
                import is finished. Then it needs to be reimported by
                other cluster node, but that's the case with DB locks as
                well.<br>
                <br>
                IMO the DB locks approach doesn't handle well crash of
                some cluster node. For example when node2 crashes
                unexpectedly when it's importing the file
                testrealm-users-3.json, the DB lock is held by this
                node, so other cluster nodes can't start on importing
                the file (until timeout occurs.)<br>
                <br>
                On the other hand, distributed executor approach may
                have issues if there is inconsistent content of the
                standalone/import directory among cluster nodes. However
                it can be solved, so that each node will need to send
                checksums of the files it has and coordinator will need
                to ensure that file with checksum "abcd123" is assigned
                just to the node which has this file.</div>
            </blockquote>
            <div><br>
            </div>
            <div>With Docker/Kubernetes all nodes would have the same
              files. At least initially. Would be nice if we could come
              up with a solution where you can just drop an additional
              file onto any node and have it imported.</div>
          </div>
        </div>
      </div>
    </blockquote>
    Exactly, was thinking about Docker too. Here we don't have any issue
    at all.<br>
    <br>
    The main question here is, do we want to support the scenario when
    various cluster nodes have different content? As I mentioned,
    distributed coordinator can handle it, so that each cluster node
    will send the checksums of the files it has and coordinator will
    always assign to node just the checksums, which it has.<br>
    <br>
    However regardless of distributed executor approach or DB locks
    approach, there may be still the issues. For example:<br>
    1) The file testrealm.json with checksum "abc" is triggered for
    import on node1<br>
    2) At the same time, admin will do some minor change in this file on
    node2 and save it. This will mean that checksum of the file on node2
    will be changed to "def"<br>
    3) Node2 will trigger import of that file. So we have both node1 and
    node2 importing same file concurrently because the previously
    retrieved lock was for "abc" checksum, but now checksum is "def" <br>
    <br>
    This problem will be with both DB lock and DistributedExecutor
    approaches though...<br>
    <br>
    Marek<br>
    <blockquote
cite="mid:CAJgngAeLDWnyE=OZ5k+0uZT9gV+jNvs=r_2WA4acXQhCEMxF1g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font
                    color="#888888"><br>
                    <br>
                    Marek</font></span>
                <div>
                  <div class="h5"><br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span><font
                                    color="#888888"><br>
                                    <br>
                                    Marek</font></span>
                                <div>
                                  <div><br>
                                    <br>
                                    <blockquote type="cite">
                                      <div dir="ltr">
                                        <div class="gmail_extra">
                                          <div class="gmail_quote">
                                            <div> </div>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <div dir="ltr"> <br>
                                              </div>
                                              <div class="gmail_extra"><br>
                                                <div class="gmail_quote">
                                                  <div>
                                                    <div>On Mon, Nov 9,
                                                      2015 at 1:20 PM,
                                                      Stian Thorgersen <span
                                                        dir="ltr">&lt;<a
moz-do-not-send="true" href="mailto:sthorger@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sthorger@redhat.com">sthorger@redhat.com</a></a>&gt;</span>
                                                      wrote:<br>
                                                    </div>
                                                  </div>
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0 0 0
                                                    .8ex;border-left:1px
                                                    #ccc
                                                    solid;padding-left:1ex">
                                                    <div>
                                                      <div>
                                                        <div dir="ltr">Currently
                                                          we support
                                                          importing a
                                                          complete realm
                                                          definition
                                                          using the
                                                          import/export
                                                          feature.
                                                          Issues with
                                                          the current
                                                          approach is:
                                                          <div><br>
                                                          </div>
                                                          <div>* Only
                                                          complete realm
                                                          - not possible
                                                          to add to an
                                                          existing realm</div>
                                                          <div>* No good
                                                          feedback if
                                                          import was
                                                          successful or
                                                          not</div>
                                                          <div>* Use of
                                                          system
                                                          properties to
                                                          initiate the
                                                          import is not
                                                          very user
                                                          friendly</div>
                                                          <div>* Not
                                                          very elegant
                                                          for
                                                          provisioning.
                                                          For example a
                                                          Docker image
                                                          that want's to
                                                          bundle some
                                                          initial setup
                                                          ends up always
                                                          running the
                                                          import of a
                                                          realm, which
                                                          is skipped if
                                                          realm exists</div>
                                                          <div><br>
                                                          </div>
                                                          <div>To solve
                                                          this I've come
                                                          up with the
                                                          following
                                                          proposal:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>Allow
                                                          dropping
                                                          representations
                                                          to be imported
                                                          into
                                                          'standalone/import'.
                                                          This should
                                                          support
                                                          creating a new
                                                          realm as well
                                                          as importing
                                                          into an
                                                          existing
                                                          realm. When
                                                          importing into
                                                          an existing
                                                          realm we will
                                                          have an import
                                                          strategy that
                                                          is used to
                                                          configure what
                                                          happens if a
                                                          resource
                                                          exists (user,
                                                          role, identity
                                                          provider, user
                                                          federtation
                                                          provider). The
                                                          import
                                                          strategies
                                                          are:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>* Skip -
                                                          existing
                                                          resources are
                                                          skipped,</div>
                                                          <div>* Fail -
                                                          if any
                                                          resource
                                                          exists nothing
                                                          is imported</div>
                                                          <div>*
                                                          Overwrite -
                                                          any existing
                                                          resources are
                                                          deleted.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>The
                                                          directory will
                                                          be scanned at
                                                          startup, but
                                                          there will
                                                          also be an
                                                          option to
                                                          monitor this
                                                          directory at
                                                          runtime.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>To
                                                          prevent a file
                                                          being imported
                                                          multiple times
                                                          (also to make
                                                          sure only one
                                                          node in a
                                                          cluster
                                                          imports) we
                                                          will have a
                                                          table in the
                                                          database that
                                                          contains what
                                                          files was
                                                          imported, from
                                                          what node,
                                                          date and
                                                          result
                                                          (including a
                                                          list of what
                                                          resources
                                                          where
                                                          imported,
                                                          which was not,
                                                          and stack
                                                          trace if
                                                          applicable).
                                                          The primary
                                                          key will be
                                                          the checksum
                                                          of the file.
                                                          We will also
                                                          add marker
                                                          files
                                                          (&lt;json
                                                          file&gt;.imported
                                                          or &lt;json
                                                          file&gt;.failed).
                                                          The contents
                                                          of the marker
                                                          files will be
                                                          a json object
                                                          with date
                                                          imported,
                                                          outcome
                                                          (including
                                                          stack trace if
                                                          applicable) as
                                                          well as a
                                                          complete list
                                                          of what
                                                          resources was
                                                          successfully
                                                          imported, what
                                                          where not.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>The files
                                                          will also
                                                          allow
                                                          resolving
                                                          system
                                                          properties and
                                                          environment
                                                          variables. For
                                                          example:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>{</div>
                                                          <div>   
                                                          "secret":
                                                          "${env.MYCLIENT_SECRET}"</div>
                                                          <div>}</div>
                                                          <div><br>
                                                          </div>
                                                          <div>This will
                                                          be very
                                                          convenient for
                                                          example with
                                                          Docker as it
                                                          would be very
                                                          easy to create
                                                          a Docker image
                                                          that extends
                                                          ours to add a
                                                          few clients
                                                          and users.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>It will
                                                          also be
                                                          convenient for
                                                          examples as it
                                                          will make it
                                                          possible to
                                                          add the
                                                          required
                                                          clients and
                                                          users to an
                                                          existing
                                                          realm.</div>
                                                          <div><br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                        </div>
                                                        <br>
                                                      </div>
                                                    </div>
_______________________________________________<br>
                                                    keycloak-dev mailing
                                                    list<br>
                                                    <a
                                                      moz-do-not-send="true"
href="mailto:keycloak-dev@lists.jboss.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:keycloak-dev@lists.jboss.org">keycloak-dev@lists.jboss.org</a></a><br>
                                                    <a
                                                      moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/keycloak-dev"
                                                      rel="noreferrer"
                                                      target="_blank"><a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></a><br>
                                                  </blockquote>
                                                </div>
                                                <br>
                                              </div>
                                            </blockquote>
                                          </div>
                                          <br>
                                        </div>
                                      </div>
                                      <br>
                                      <fieldset></fieldset>
                                      <br>
                                      <pre>_______________________________________________
keycloak-dev mailing list
<a moz-do-not-send="true" href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
                                    </blockquote>
                                    <br>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                          </div>
                          <br>
                        </div>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>