<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 11 November 2015 at 15:23, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><span class="">
    <div>On 11/11/15 09:01, Stian Thorgersen
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On 10 November 2015 at 16:11, Marek
            Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span>
                  <div>On 09/11/15 14:09, Stian Thorgersen wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On 9 November 2015 at
                          13:35, Sebastien Blanc <span dir="ltr">&lt;<a href="mailto:sblanc@redhat.com" target="_blank"></a><a href="mailto:sblanc@redhat.com" target="_blank">sblanc@redhat.com</a>&gt;</span>
                          wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                            <div dir="ltr">
                              <div>That would be really nice indeed ! <br>
                              </div>
                              But are the markers files not enough,
                              instead of also having a table in the DB ?<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>We need a way to prevent multiple nodes
                            in a cluster to import the same file. For
                            example on Kerberos you end up spinning up
                            multiple instances of the same Docker image.
                            <br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> I bet you meant &#39;Kubernetes&#39; <span><span> :-)
                  </span></span></div>
            </blockquote>
            <div><br>
            </div>
            <div>Yup</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span><span> </span></span><br>
                <br>
                +1 for the improvements. Besides those I think that
                earlier or later, we will need to solve long-running
                export+import where you want to import 100.000 users. <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>+1</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"> <br>
                As I mentioned in another mail few weeks ago, we can
                have:<br>
                <br>
                1) Table with the progress (51.000 users already
                imported, around 49.000 remaining etc.)<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>We would still need to split into multiple files in
              either case. Having a single json file with 100K users is
              probably not going to perform very well. So what I
              proposed would actually work for long-running import as
              well. If each file has a manageable amount of users (say
              ~5 min to import) then each file will be marked as
              imported or failed. At least for now I don&#39;t think we
              should do smaller batches than one file. As long as one
              file is imported within the same TX then it&#39;s an all or
              nothing import.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"> 2) Concurrency and
                dividing the work among cluster nodes (Node1 will import
                50.000 users and node2 another 50.000 users)<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>This would be solved as well. Each node picks up a file
              that&#39;s not processed yet. Marks it in the DB and then gets
              to process it.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"> 3) Failover (Import
                won&#39;t be completely broken if cluster node crashes after
                import 90.000, but can continue on other cluster nodes)<br>
                <br>
                I think the stuff I did recently for pre-loading offline
                sessions at startup could be reused for this stuff too
                and it can handle (2) and (3) . Also it can handle
                parallel import triggered from more cluster nodes. <br>
                <br>
                For example: currently if you trigger kubernetes with 2
                cluster nodes, both nodes will start to import same file
                at the same time because import triggered by node1 is
                not yet finished before node2 is started, so there is
                not yet existing DB record that file is already
                imported. With the stuff I did, just the coordinator
                (node1) will start the import . Node2 will wait until
                import triggered by node1 is finished, but at the same
                time it can &quot;help&quot; to import some users (pages) if
                coordinator asks him to do so. This impl is based on
                infinispan distributed executor service <a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank"></a><a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a>
                .</div>
            </blockquote>
            <div><br>
            </div>
            <div>The DB record needs to be created before a node tries
              to import it, including a timestamp when it started the
              import. It should then be updated once the import is
              completed, with the result. Using the distributed
              execution framework sounds like a good idea though. How do
              you prevent scheduling the same job multiple times? For
              example if all nodes on startup scan the import folder and
              simply import everything they find, then there will be
              multiple of the same job. Not really a big deal as the
              first thing the job should do is check if there&#39;s a record
              in the DB already.</div>
          </div>
        </div>
      </div>
    </blockquote></span>
    With distributed executor, it&#39;s the cluster coordinator, which
    coordinates which node would import what. It will send messages to
    cluster nodes like &quot;Hey, please import the file
    testrealm-users-3.json with timestamp abcd123&quot; . <br>
    <br>
    After node finishes the job, it notifies coordinator and coordinator
    will insert DB record and mark it as finished. So there is no DB
    record inserted before node starts import, because whole
    coordination is handled by the coordinator. Also there will never be
    same file imported more times by different cluster nodes. <br>
    <br>
    Only exception would be if cluster node crashes before import is
    finished. Then it needs to be reimported by other cluster node, but
    that&#39;s the case with DB locks as well.<br>
    <br>
    IMO the DB locks approach doesn&#39;t handle well crash of some cluster
    node. For example when node2 crashes unexpectedly when it&#39;s
    importing the file testrealm-users-3.json, the DB lock is held by
    this node, so other cluster nodes can&#39;t start on importing the file
    (until timeout occurs.)<br>
    <br>
    On the other hand, distributed executor approach may have issues if
    there is inconsistent content of the standalone/import directory
    among cluster nodes. However it can be solved, so that each node
    will need to send checksums of the files it has and coordinator will
    need to ensure that file with checksum &quot;abcd123&quot; is assigned just to
    the node which has this file.</div></blockquote><div><br></div><div>With Docker/Kubernetes all nodes would have the same files. At least initially. Would be nice if we could come up with a solution where you can just drop an additional file onto any node and have it imported.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font color="#888888"><br>
    <br>
    Marek</font></span><div><div class="h5"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>
                    <br>
                    Marek</font></span>
                <div>
                  <div><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div dir="ltr"> <br>
                              </div>
                              <div class="gmail_extra"><br>
                                <div class="gmail_quote">
                                  <div>
                                    <div>On Mon, Nov 9, 2015 at 1:20 PM,
                                      Stian Thorgersen <span dir="ltr">&lt;<a href="mailto:sthorger@redhat.com" target="_blank"></a><a href="mailto:sthorger@redhat.com" target="_blank">sthorger@redhat.com</a>&gt;</span>
                                      wrote:<br>
                                    </div>
                                  </div>
                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                    <div>
                                      <div>
                                        <div dir="ltr">Currently we
                                          support importing a complete
                                          realm definition using the
                                          import/export feature. Issues
                                          with the current approach is:
                                          <div><br>
                                          </div>
                                          <div>* Only complete realm -
                                            not possible to add to an
                                            existing realm</div>
                                          <div>* No good feedback if
                                            import was successful or not</div>
                                          <div>* Use of system
                                            properties to initiate the
                                            import is not very user
                                            friendly</div>
                                          <div>* Not very elegant for
                                            provisioning. For example a
                                            Docker image that want&#39;s to
                                            bundle some initial setup
                                            ends up always running the
                                            import of a realm, which is
                                            skipped if realm exists</div>
                                          <div><br>
                                          </div>
                                          <div>To solve this I&#39;ve come
                                            up with the following
                                            proposal:</div>
                                          <div><br>
                                          </div>
                                          <div>Allow dropping
                                            representations to be
                                            imported into
                                            &#39;standalone/import&#39;. This
                                            should support creating a
                                            new realm as well as
                                            importing into an existing
                                            realm. When importing into
                                            an existing realm we will
                                            have an import strategy that
                                            is used to configure what
                                            happens if a resource exists
                                            (user, role, identity
                                            provider, user federtation
                                            provider). The import
                                            strategies are:</div>
                                          <div><br>
                                          </div>
                                          <div>* Skip - existing
                                            resources are skipped,</div>
                                          <div>* Fail - if any resource
                                            exists nothing is imported</div>
                                          <div>* Overwrite - any
                                            existing resources are
                                            deleted.</div>
                                          <div><br>
                                          </div>
                                          <div>The directory will be
                                            scanned at startup, but
                                            there will also be an option
                                            to monitor this directory at
                                            runtime.</div>
                                          <div><br>
                                          </div>
                                          <div>To prevent a file being
                                            imported multiple times
                                            (also to make sure only one
                                            node in a cluster imports)
                                            we will have a table in the
                                            database that contains what
                                            files was imported, from
                                            what node, date and result
                                            (including a list of what
                                            resources where imported,
                                            which was not, and stack
                                            trace if applicable). The
                                            primary key will be the
                                            checksum of the file. We
                                            will also add marker files
                                            (&lt;json file&gt;.imported
                                            or &lt;json
                                            file&gt;.failed). The
                                            contents of the marker files
                                            will be a json object with
                                            date imported, outcome
                                            (including stack trace if
                                            applicable) as well as a
                                            complete list of what
                                            resources was successfully
                                            imported, what where not.</div>
                                          <div><br>
                                          </div>
                                          <div>The files will also allow
                                            resolving system properties
                                            and environment variables.
                                            For example:</div>
                                          <div><br>
                                          </div>
                                          <div>{</div>
                                          <div>    &quot;secret&quot;:
                                            &quot;${env.MYCLIENT_SECRET}&quot;</div>
                                          <div>}</div>
                                          <div><br>
                                          </div>
                                          <div>This will be very
                                            convenient for example with
                                            Docker as it would be very
                                            easy to create a Docker
                                            image that extends ours to
                                            add a few clients and users.</div>
                                          <div><br>
                                          </div>
                                          <div>It will also be
                                            convenient for examples as
                                            it will make it possible to
                                            add the required clients and
                                            users to an existing realm.</div>
                                          <div><br>
                                          </div>
                                          <div><br>
                                          </div>
                                        </div>
                                        <br>
                                      </div>
                                    </div>
_______________________________________________<br>
                                    keycloak-dev mailing list<br>
                                    <a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a><br>
                                    <a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" rel="noreferrer" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a><br>
                                  </blockquote>
                                </div>
                                <br>
                              </div>
                            </blockquote>
                          </div>
                          <br>
                        </div>
                      </div>
                      <br>
                      <fieldset></fieldset>
                      <br>
                      <pre>_______________________________________________
keycloak-dev mailing list
<a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div></div>