<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 11 November 2015 at 15:51, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF"><div><div class="h5">

    <div>On 11/11/15 15:36, Stian Thorgersen

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On 11 November 2015 at 15:23, Marek

            Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"><span>

                  <div>On 11/11/15 09:01, Stian Thorgersen wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr"><br>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On 10 November 2015 at

                          16:11, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank"></a><a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>

                          wrote:<br>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div text="#000000" bgcolor="#FFFFFF"><span>

                                <div>On 09/11/15 14:09, Stian Thorgersen

                                  wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div dir="ltr"><br>

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote">On 9

                                        November 2015 at 13:35,

                                        Sebastien Blanc <span dir="ltr">&lt;<a href="mailto:sblanc@redhat.com" target="_blank"></a><a href="mailto:sblanc@redhat.com" target="_blank">sblanc@redhat.com</a>&gt;</span>

                                        wrote:<br>

                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                          <div dir="ltr">

                                            <div>That would be really

                                              nice indeed ! <br>

                                            </div>

                                            But are the markers files

                                            not enough, instead of also

                                            having a table in the DB ?<br>

                                          </div>

                                        </blockquote>

                                        <div><br>

                                        </div>

                                        <div>We need a way to prevent

                                          multiple nodes in a cluster to

                                          import the same file. For

                                          example on Kerberos you end up

                                          spinning up multiple instances

                                          of the same Docker image. <br>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                              </span> I bet you meant &#39;Kubernetes&#39; <span><span>

                                  :-) </span></span></div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>Yup</div>

                          <div> </div>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div text="#000000" bgcolor="#FFFFFF"><span><span>

                                </span></span><br>

                              <br>

                              +1 for the improvements. Besides those I

                              think that earlier or later, we will need

                              to solve long-running export+import where

                              you want to import 100.000 users. <br>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>+1</div>

                          <div> </div>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div text="#000000" bgcolor="#FFFFFF"> <br>

                              As I mentioned in another mail few weeks

                              ago, we can have:<br>

                              <br>

                              1) Table with the progress (51.000 users

                              already imported, around 49.000 remaining

                              etc.)<br>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>We would still need to split into

                            multiple files in either case. Having a

                            single json file with 100K users is probably

                            not going to perform very well. So what I

                            proposed would actually work for

                            long-running import as well. If each file

                            has a manageable amount of users (say ~5 min

                            to import) then each file will be marked as

                            imported or failed. At least for now I don&#39;t

                            think we should do smaller batches than one

                            file. As long as one file is imported within

                            the same TX then it&#39;s an all or nothing

                            import.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div> </div>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div text="#000000" bgcolor="#FFFFFF"> 2)

                              Concurrency and dividing the work among

                              cluster nodes (Node1 will import 50.000

                              users and node2 another 50.000 users)<br>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>This would be solved as well. Each node

                            picks up a file that&#39;s not processed yet.

                            Marks it in the DB and then gets to process

                            it.</div>

                          <div> </div>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                            <div text="#000000" bgcolor="#FFFFFF"> 3)

                              Failover (Import won&#39;t be completely

                              broken if cluster node crashes after

                              import 90.000, but can continue on other

                              cluster nodes)<br>

                              <br>

                              I think the stuff I did recently for

                              pre-loading offline sessions at startup

                              could be reused for this stuff too and it

                              can handle (2) and (3) . Also it can

                              handle parallel import triggered from more

                              cluster nodes. <br>

                              <br>

                              For example: currently if you trigger

                              kubernetes with 2 cluster nodes, both

                              nodes will start to import same file at

                              the same time because import triggered by

                              node1 is not yet finished before node2 is

                              started, so there is not yet existing DB

                              record that file is already imported. With

                              the stuff I did, just the coordinator

                              (node1) will start the import . Node2 will

                              wait until import triggered by node1 is

                              finished, but at the same time it can

                              &quot;help&quot; to import some users (pages) if

                              coordinator asks him to do so. This impl

                              is based on infinispan distributed

                              executor service <a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank"></a><a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a>

                              .</div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>The DB record needs to be created before

                            a node tries to import it, including a

                            timestamp when it started the import. It

                            should then be updated once the import is

                            completed, with the result. Using the

                            distributed execution framework sounds like

                            a good idea though. How do you prevent

                            scheduling the same job multiple times? For

                            example if all nodes on startup scan the

                            import folder and simply import everything

                            they find, then there will be multiple of

                            the same job. Not really a big deal as the

                            first thing the job should do is check if

                            there&#39;s a record in the DB already.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </span> With distributed executor, it&#39;s the cluster

                coordinator, which coordinates which node would import

                what. It will send messages to cluster nodes like &quot;Hey,

                please import the file testrealm-users-3.json with

                timestamp abcd123&quot; . <br>

                <br>

                After node finishes the job, it notifies coordinator and

                coordinator will insert DB record and mark it as

                finished. So there is no DB record inserted before node

                starts import, because whole coordination is handled by

                the coordinator. Also there will never be same file

                imported more times by different cluster nodes. <br>

                <br>

                Only exception would be if cluster node crashes before

                import is finished. Then it needs to be reimported by

                other cluster node, but that&#39;s the case with DB locks as

                well.<br>

                <br>

                IMO the DB locks approach doesn&#39;t handle well crash of

                some cluster node. For example when node2 crashes

                unexpectedly when it&#39;s importing the file

                testrealm-users-3.json, the DB lock is held by this

                node, so other cluster nodes can&#39;t start on importing

                the file (until timeout occurs.)<br>

                <br>

                On the other hand, distributed executor approach may

                have issues if there is inconsistent content of the

                standalone/import directory among cluster nodes. However

                it can be solved, so that each node will need to send

                checksums of the files it has and coordinator will need

                to ensure that file with checksum &quot;abcd123&quot; is assigned

                just to the node which has this file.</div>

            </blockquote>

            <div><br>

            </div>

            <div>With Docker/Kubernetes all nodes would have the same

              files. At least initially. Would be nice if we could come

              up with a solution where you can just drop an additional

              file onto any node and have it imported.</div>

          </div>

        </div>

      </div>

    </blockquote></div></div>

    Exactly, was thinking about Docker too. Here we don&#39;t have any issue

    at all.<br>

    <br>

    The main question here is, do we want to support the scenario when

    various cluster nodes have different content? As I mentioned,

    distributed coordinator can handle it, so that each cluster node

    will send the checksums of the files it has and coordinator will

    always assign to node just the checksums, which it has.<br></div></blockquote><div><br></div><div>That would be a nice addition IMO</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">

    <br>

    However regardless of distributed executor approach or DB locks

    approach, there may be still the issues. For example:<br>

    1) The file testrealm.json with checksum &quot;abc&quot; is triggered for

    import on node1<br>

    2) At the same time, admin will do some minor change in this file on

    node2 and save it. This will mean that checksum of the file on node2

    will be changed to &quot;def&quot;<br>

    3) Node2 will trigger import of that file. So we have both node1 and

    node2 importing same file concurrently because the previously

    retrieved lock was for &quot;abc&quot; checksum, but now checksum is &quot;def&quot; <br>

    <br>

    This problem will be with both DB lock and DistributedExecutor

    approaches though...</div></blockquote><div><br></div><div>Maybe a better approach is to elect a single node that can perform imports and only allow one import at the time?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font color="#888888"><br>

    <br>

    Marek</font></span><div><div class="h5"><br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>

                    <br>

                    Marek</font></span>

                <div>

                  <div><br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                              <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>

                                    <br>

                                    Marek</font></span>

                                <div>

                                  <div><br>

                                    <br>

                                    <blockquote type="cite">

                                      <div dir="ltr">

                                        <div class="gmail_extra">

                                          <div class="gmail_quote">

                                            <div> </div>

                                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                              <div dir="ltr"> <br>

                                              </div>

                                              <div class="gmail_extra"><br>

                                                <div class="gmail_quote">

                                                  <div>

                                                    <div>On Mon, Nov 9,

                                                      2015 at 1:20 PM,

                                                      Stian Thorgersen <span dir="ltr">&lt;<a href="mailto:sthorger@redhat.com" target="_blank"></a><a href="mailto:sthorger@redhat.com" target="_blank">sthorger@redhat.com</a>&gt;</span>

                                                      wrote:<br>

                                                    </div>

                                                  </div>

                                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                    <div>

                                                      <div>

                                                        <div dir="ltr">Currently

                                                          we support

                                                          importing a

                                                          complete realm

                                                          definition

                                                          using the

                                                          import/export

                                                          feature.

                                                          Issues with

                                                          the current

                                                          approach is:

                                                          <div><br>

                                                          </div>

                                                          <div>* Only

                                                          complete realm

                                                          - not possible

                                                          to add to an

                                                          existing realm</div>

                                                          <div>* No good

                                                          feedback if

                                                          import was

                                                          successful or

                                                          not</div>

                                                          <div>* Use of

                                                          system

                                                          properties to

                                                          initiate the

                                                          import is not

                                                          very user

                                                          friendly</div>

                                                          <div>* Not

                                                          very elegant

                                                          for

                                                          provisioning.

                                                          For example a

                                                          Docker image

                                                          that want&#39;s to

                                                          bundle some

                                                          initial setup

                                                          ends up always

                                                          running the

                                                          import of a

                                                          realm, which

                                                          is skipped if

                                                          realm exists</div>

                                                          <div><br>

                                                          </div>

                                                          <div>To solve

                                                          this I&#39;ve come

                                                          up with the

                                                          following

                                                          proposal:</div>

                                                          <div><br>

                                                          </div>

                                                          <div>Allow

                                                          dropping

                                                          representations

                                                          to be imported

                                                          into

                                                          &#39;standalone/import&#39;.

                                                          This should

                                                          support

                                                          creating a new

                                                          realm as well

                                                          as importing

                                                          into an

                                                          existing

                                                          realm. When

                                                          importing into

                                                          an existing

                                                          realm we will

                                                          have an import

                                                          strategy that

                                                          is used to

                                                          configure what

                                                          happens if a

                                                          resource

                                                          exists (user,

                                                          role, identity

                                                          provider, user

                                                          federtation

                                                          provider). The

                                                          import

                                                          strategies

                                                          are:</div>

                                                          <div><br>

                                                          </div>

                                                          <div>* Skip -

                                                          existing

                                                          resources are

                                                          skipped,</div>

                                                          <div>* Fail -

                                                          if any

                                                          resource

                                                          exists nothing

                                                          is imported</div>

                                                          <div>*

                                                          Overwrite -

                                                          any existing

                                                          resources are

                                                          deleted.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>The

                                                          directory will

                                                          be scanned at

                                                          startup, but

                                                          there will

                                                          also be an

                                                          option to

                                                          monitor this

                                                          directory at

                                                          runtime.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>To

                                                          prevent a file

                                                          being imported

                                                          multiple times

                                                          (also to make

                                                          sure only one

                                                          node in a

                                                          cluster

                                                          imports) we

                                                          will have a

                                                          table in the

                                                          database that

                                                          contains what

                                                          files was

                                                          imported, from

                                                          what node,

                                                          date and

                                                          result

                                                          (including a

                                                          list of what

                                                          resources

                                                          where

                                                          imported,

                                                          which was not,

                                                          and stack

                                                          trace if

                                                          applicable).

                                                          The primary

                                                          key will be

                                                          the checksum

                                                          of the file.

                                                          We will also

                                                          add marker

                                                          files

                                                          (&lt;json

                                                          file&gt;.imported

                                                          or &lt;json

                                                          file&gt;.failed).

                                                          The contents

                                                          of the marker

                                                          files will be

                                                          a json object

                                                          with date

                                                          imported,

                                                          outcome

                                                          (including

                                                          stack trace if

                                                          applicable) as

                                                          well as a

                                                          complete list

                                                          of what

                                                          resources was

                                                          successfully

                                                          imported, what

                                                          where not.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>The files

                                                          will also

                                                          allow

                                                          resolving

                                                          system

                                                          properties and

                                                          environment

                                                          variables. For

                                                          example:</div>

                                                          <div><br>

                                                          </div>

                                                          <div>{</div>

                                                          <div>   

                                                          &quot;secret&quot;:

                                                          &quot;${env.MYCLIENT_SECRET}&quot;</div>

                                                          <div>}</div>

                                                          <div><br>

                                                          </div>

                                                          <div>This will

                                                          be very

                                                          convenient for

                                                          example with

                                                          Docker as it

                                                          would be very

                                                          easy to create

                                                          a Docker image

                                                          that extends

                                                          ours to add a

                                                          few clients

                                                          and users.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>It will

                                                          also be

                                                          convenient for

                                                          examples as it

                                                          will make it

                                                          possible to

                                                          add the

                                                          required

                                                          clients and

                                                          users to an

                                                          existing

                                                          realm.</div>

                                                          <div><br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                        </div>

                                                        <br>

                                                      </div>

                                                    </div>

_______________________________________________<br>

                                                    keycloak-dev mailing

                                                    list<br>

                                                    <a href="mailto:keycloak-dev@lists.jboss.org" target="_blank"></a><a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a><br>

                                                    <a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" rel="noreferrer" target="_blank"></a><a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a><br>

                                                  </blockquote>

                                                </div>

                                                <br>

                                              </div>

                                            </blockquote>

                                          </div>

                                          <br>

                                        </div>

                                      </div>

                                      <br>

                                      <fieldset></fieldset>

                                      <br>

                                      <pre>_______________________________________________

keycloak-dev mailing list

<a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>

<a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>

                                    </blockquote>

                                    <br>

                                  </div>

                                </div>

                              </div>

                            </blockquote>

                          </div>

                          <br>

                        </div>

                      </div>

                    </blockquote>

                    <br>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div></div>