<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 11 November 2015 at 15:23, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF"><span class="">

    <div>On 11/11/15 09:01, Stian Thorgersen

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On 10 November 2015 at 16:11, Marek

            Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"><span>

                  <div>On 09/11/15 14:09, Stian Thorgersen wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr"><br>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On 9 November 2015 at

                          13:35, Sebastien Blanc <span dir="ltr">&lt;<a href="mailto:sblanc@redhat.com" target="_blank"></a><a href="mailto:sblanc@redhat.com" target="_blank">sblanc@redhat.com</a>&gt;</span>

                          wrote:<br>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div dir="ltr">

                              <div>That would be really nice indeed ! <br>

                              </div>

                              But are the markers files not enough,

                              instead of also having a table in the DB ?<br>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>We need a way to prevent multiple nodes

                            in a cluster to import the same file. For

                            example on Kerberos you end up spinning up

                            multiple instances of the same Docker image.

                            <br>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </span> I bet you meant &#39;Kubernetes&#39; <span><span> :-)

                  </span></span></div>

            </blockquote>

            <div><br>

            </div>

            <div>Yup</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"><span><span> </span></span><br>

                <br>

                +1 for the improvements. Besides those I think that

                earlier or later, we will need to solve long-running

                export+import where you want to import 100.000 users. <br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>+1</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"> <br>

                As I mentioned in another mail few weeks ago, we can

                have:<br>

                <br>

                1) Table with the progress (51.000 users already

                imported, around 49.000 remaining etc.)<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>We would still need to split into multiple files in

              either case. Having a single json file with 100K users is

              probably not going to perform very well. So what I

              proposed would actually work for long-running import as

              well. If each file has a manageable amount of users (say

              ~5 min to import) then each file will be marked as

              imported or failed. At least for now I don&#39;t think we

              should do smaller batches than one file. As long as one

              file is imported within the same TX then it&#39;s an all or

              nothing import.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"> 2) Concurrency and

                dividing the work among cluster nodes (Node1 will import

                50.000 users and node2 another 50.000 users)<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>This would be solved as well. Each node picks up a file

              that&#39;s not processed yet. Marks it in the DB and then gets

              to process it.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"> 3) Failover (Import

                won&#39;t be completely broken if cluster node crashes after

                import 90.000, but can continue on other cluster nodes)<br>

                <br>

                I think the stuff I did recently for pre-loading offline

                sessions at startup could be reused for this stuff too

                and it can handle (2) and (3) . Also it can handle

                parallel import triggered from more cluster nodes. <br>

                <br>

                For example: currently if you trigger kubernetes with 2

                cluster nodes, both nodes will start to import same file

                at the same time because import triggered by node1 is

                not yet finished before node2 is started, so there is

                not yet existing DB record that file is already

                imported. With the stuff I did, just the coordinator

                (node1) will start the import . Node2 will wait until

                import triggered by node1 is finished, but at the same

                time it can &quot;help&quot; to import some users (pages) if

                coordinator asks him to do so. This impl is based on

                infinispan distributed executor service <a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank"></a><a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a>

                .</div>

            </blockquote>

            <div><br>

            </div>

            <div>The DB record needs to be created before a node tries

              to import it, including a timestamp when it started the

              import. It should then be updated once the import is

              completed, with the result. Using the distributed

              execution framework sounds like a good idea though. How do

              you prevent scheduling the same job multiple times? For

              example if all nodes on startup scan the import folder and

              simply import everything they find, then there will be

              multiple of the same job. Not really a big deal as the

              first thing the job should do is check if there&#39;s a record

              in the DB already.</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    With distributed executor, it&#39;s the cluster coordinator, which

    coordinates which node would import what. It will send messages to

    cluster nodes like &quot;Hey, please import the file

    testrealm-users-3.json with timestamp abcd123&quot; . <br>

    <br>

    After node finishes the job, it notifies coordinator and coordinator

    will insert DB record and mark it as finished. So there is no DB

    record inserted before node starts import, because whole

    coordination is handled by the coordinator. Also there will never be

    same file imported more times by different cluster nodes. <br>

    <br>

    Only exception would be if cluster node crashes before import is

    finished. Then it needs to be reimported by other cluster node, but

    that&#39;s the case with DB locks as well.<br>

    <br>

    IMO the DB locks approach doesn&#39;t handle well crash of some cluster

    node. For example when node2 crashes unexpectedly when it&#39;s

    importing the file testrealm-users-3.json, the DB lock is held by

    this node, so other cluster nodes can&#39;t start on importing the file

    (until timeout occurs.)<br>

    <br>

    On the other hand, distributed executor approach may have issues if

    there is inconsistent content of the standalone/import directory

    among cluster nodes. However it can be solved, so that each node

    will need to send checksums of the files it has and coordinator will

    need to ensure that file with checksum &quot;abcd123&quot; is assigned just to

    the node which has this file.</div></blockquote><div><br></div><div>With Docker/Kubernetes all nodes would have the same files. At least initially. Would be nice if we could come up with a solution where you can just drop an additional file onto any node and have it imported.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font color="#888888"><br>

    <br>

    Marek</font></span><div><div class="h5"><br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>

                    <br>

                    Marek</font></span>

                <div>

                  <div><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                              <div dir="ltr"> <br>

                              </div>

                              <div class="gmail_extra"><br>

                                <div class="gmail_quote">

                                  <div>

                                    <div>On Mon, Nov 9, 2015 at 1:20 PM,

                                      Stian Thorgersen <span dir="ltr">&lt;<a href="mailto:sthorger@redhat.com" target="_blank"></a><a href="mailto:sthorger@redhat.com" target="_blank">sthorger@redhat.com</a>&gt;</span>

                                      wrote:<br>

                                    </div>

                                  </div>

                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                    <div>

                                      <div>

                                        <div dir="ltr">Currently we

                                          support importing a complete

                                          realm definition using the

                                          import/export feature. Issues

                                          with the current approach is:

                                          <div><br>

                                          </div>

                                          <div>* Only complete realm -

                                            not possible to add to an

                                            existing realm</div>

                                          <div>* No good feedback if

                                            import was successful or not</div>

                                          <div>* Use of system

                                            properties to initiate the

                                            import is not very user

                                            friendly</div>

                                          <div>* Not very elegant for

                                            provisioning. For example a

                                            Docker image that want&#39;s to

                                            bundle some initial setup

                                            ends up always running the

                                            import of a realm, which is

                                            skipped if realm exists</div>

                                          <div><br>

                                          </div>

                                          <div>To solve this I&#39;ve come

                                            up with the following

                                            proposal:</div>

                                          <div><br>

                                          </div>

                                          <div>Allow dropping

                                            representations to be

                                            imported into

                                            &#39;standalone/import&#39;. This

                                            should support creating a

                                            new realm as well as

                                            importing into an existing

                                            realm. When importing into

                                            an existing realm we will

                                            have an import strategy that

                                            is used to configure what

                                            happens if a resource exists

                                            (user, role, identity

                                            provider, user federtation

                                            provider). The import

                                            strategies are:</div>

                                          <div><br>

                                          </div>

                                          <div>* Skip - existing

                                            resources are skipped,</div>

                                          <div>* Fail - if any resource

                                            exists nothing is imported</div>

                                          <div>* Overwrite - any

                                            existing resources are

                                            deleted.</div>

                                          <div><br>

                                          </div>

                                          <div>The directory will be

                                            scanned at startup, but

                                            there will also be an option

                                            to monitor this directory at

                                            runtime.</div>

                                          <div><br>

                                          </div>

                                          <div>To prevent a file being

                                            imported multiple times

                                            (also to make sure only one

                                            node in a cluster imports)

                                            we will have a table in the

                                            database that contains what

                                            files was imported, from

                                            what node, date and result

                                            (including a list of what

                                            resources where imported,

                                            which was not, and stack

                                            trace if applicable). The

                                            primary key will be the

                                            checksum of the file. We

                                            will also add marker files

                                            (&lt;json file&gt;.imported

                                            or &lt;json

                                            file&gt;.failed). The

                                            contents of the marker files

                                            will be a json object with

                                            date imported, outcome

                                            (including stack trace if

                                            applicable) as well as a

                                            complete list of what

                                            resources was successfully

                                            imported, what where not.</div>

                                          <div><br>

                                          </div>

                                          <div>The files will also allow

                                            resolving system properties

                                            and environment variables.

                                            For example:</div>

                                          <div><br>

                                          </div>

                                          <div>{</div>

                                          <div>    &quot;secret&quot;:

                                            &quot;${env.MYCLIENT_SECRET}&quot;</div>

                                          <div>}</div>

                                          <div><br>

                                          </div>

                                          <div>This will be very

                                            convenient for example with

                                            Docker as it would be very

                                            easy to create a Docker

                                            image that extends ours to

                                            add a few clients and users.</div>

                                          <div><br>

                                          </div>

                                          <div>It will also be

                                            convenient for examples as

                                            it will make it possible to

                                            add the required clients and

                                            users to an existing realm.</div>

                                          <div><br>

                                          </div>

                                          <div><br>

                                          </div>

                                        </div>

                                        <br>

                                      </div>

                                    </div>

_______________________________________________<br>

                                    keycloak-dev mailing list<br>

                                    <a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a><br>

                                    <a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" rel="noreferrer" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a><br>

                                  </blockquote>

                                </div>

                                <br>

                              </div>

                            </blockquote>

                          </div>

                          <br>

                        </div>

                      </div>

                      <br>

                      <fieldset></fieldset>

                      <br>

                      <pre>_______________________________________________

keycloak-dev mailing list

<a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>

<a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>

                    </blockquote>

                    <br>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div></div>