<div dir="ltr">What about if we remove files as they are being imported?<br><div><br></div><div>Something like:</div><div><br></div><div>* When we detect a new file on a node, we send the name + checksum to the cordinator</div><div>* The cordinator then checks if this file has already been imported</div><div>* If it&#39;s imported it sends out a message stating that file with name + checksum is already imported and all nodes delete this file</div><div>* If it&#39;s not imported it picks one node that is responsible to import the file (that would be the first node that sends the message about the file+checksum). This node will rename the file to .importing</div><div>* Once the file has been imported it&#39;s renamed to .imported or .failed</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 11 November 2015 at 15:53, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><div><div class="h5">
    <div>On 11/11/15 15:51, Marek Posolda wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div>On 11/11/15 15:36, Stian Thorgersen
        wrote:<br>
      </div>
      <blockquote type="cite">
        <div dir="ltr"><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On 11 November 2015 at 15:23, Marek
              Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span>
                    <div>On 11/11/15 09:01, Stian Thorgersen wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr"><br>
                        <div class="gmail_extra"><br>
                          <div class="gmail_quote">On 10 November 2015
                            at 16:11, Marek Posolda <span dir="ltr">&lt;<a href="mailto:mposolda@redhat.com" target="_blank"></a><a href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>&gt;</span>
                            wrote:<br>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span>
                                  <div>On 09/11/15 14:09, Stian
                                    Thorgersen wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr"><br>
                                      <div class="gmail_extra"><br>
                                        <div class="gmail_quote">On 9
                                          November 2015 at 13:35,
                                          Sebastien Blanc <span dir="ltr">&lt;<a href="mailto:sblanc@redhat.com" target="_blank"></a><a href="mailto:sblanc@redhat.com" target="_blank">sblanc@redhat.com</a>&gt;</span> wrote:<br>
                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                            <div dir="ltr">
                                              <div>That would be really
                                                nice indeed ! <br>
                                              </div>
                                              But are the markers files
                                              not enough, instead of
                                              also having a table in the
                                              DB ?<br>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>We need a way to prevent
                                            multiple nodes in a cluster
                                            to import the same file. For
                                            example on Kerberos you end
                                            up spinning up multiple
                                            instances of the same Docker
                                            image. <br>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span> I bet you meant &#39;Kubernetes&#39; <span><span>
                                    :-) </span></span></div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Yup</div>
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span><span>
                                  </span></span><br>
                                <br>
                                +1 for the improvements. Besides those I
                                think that earlier or later, we will
                                need to solve long-running export+import
                                where you want to import 100.000 users.
                                <br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>+1</div>
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                As I mentioned in another mail few weeks
                                ago, we can have:<br>
                                <br>
                                1) Table with the progress (51.000 users
                                already imported, around 49.000
                                remaining etc.)<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>We would still need to split into
                              multiple files in either case. Having a
                              single json file with 100K users is
                              probably not going to perform very well.
                              So what I proposed would actually work for
                              long-running import as well. If each file
                              has a manageable amount of users (say ~5
                              min to import) then each file will be
                              marked as imported or failed. At least for
                              now I don&#39;t think we should do smaller
                              batches than one file. As long as one file
                              is imported within the same TX then it&#39;s
                              an all or nothing import.</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> 2)
                                Concurrency and dividing the work among
                                cluster nodes (Node1 will import 50.000
                                users and node2 another 50.000 users)<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>This would be solved as well. Each node
                              picks up a file that&#39;s not processed yet.
                              Marks it in the DB and then gets to
                              process it.</div>
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> 3)
                                Failover (Import won&#39;t be completely
                                broken if cluster node crashes after
                                import 90.000, but can continue on other
                                cluster nodes)<br>
                                <br>
                                I think the stuff I did recently for
                                pre-loading offline sessions at startup
                                could be reused for this stuff too and
                                it can handle (2) and (3) . Also it can
                                handle parallel import triggered from
                                more cluster nodes. <br>
                                <br>
                                For example: currently if you trigger
                                kubernetes with 2 cluster nodes, both
                                nodes will start to import same file at
                                the same time because import triggered
                                by node1 is not yet finished before
                                node2 is started, so there is not yet
                                existing DB record that file is already
                                imported. With the stuff I did, just the
                                coordinator (node1) will start the
                                import . Node2 will wait until import
                                triggered by node1 is finished, but at
                                the same time it can &quot;help&quot; to import
                                some users (pages) if coordinator asks
                                him to do so. This impl is based on
                                infinispan distributed executor service
                                <a href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework" target="_blank">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a>
                                .</div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>The DB record needs to be created
                              before a node tries to import it,
                              including a timestamp when it started the
                              import. It should then be updated once the
                              import is completed, with the result.
                              Using the distributed execution framework
                              sounds like a good idea though. How do you
                              prevent scheduling the same job multiple
                              times? For example if all nodes on startup
                              scan the import folder and simply import
                              everything they find, then there will be
                              multiple of the same job. Not really a big
                              deal as the first thing the job should do
                              is check if there&#39;s a record in the DB
                              already.</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span> With distributed executor, it&#39;s the cluster
                  coordinator, which coordinates which node would import
                  what. It will send messages to cluster nodes like
                  &quot;Hey, please import the file testrealm-users-3.json
                  with timestamp abcd123&quot; . <br>
                  <br>
                  After node finishes the job, it notifies coordinator
                  and coordinator will insert DB record and mark it as
                  finished. So there is no DB record inserted before
                  node starts import, because whole coordination is
                  handled by the coordinator. Also there will never be
                  same file imported more times by different cluster
                  nodes. <br>
                  <br>
                  Only exception would be if cluster node crashes before
                  import is finished. Then it needs to be reimported by
                  other cluster node, but that&#39;s the case with DB locks
                  as well.<br>
                  <br>
                  IMO the DB locks approach doesn&#39;t handle well crash of
                  some cluster node. For example when node2 crashes
                  unexpectedly when it&#39;s importing the file
                  testrealm-users-3.json, the DB lock is held by this
                  node, so other cluster nodes can&#39;t start on importing
                  the file (until timeout occurs.)<br>
                  <br>
                  On the other hand, distributed executor approach may
                  have issues if there is inconsistent content of the
                  standalone/import directory among cluster nodes.
                  However it can be solved, so that each node will need
                  to send checksums of the files it has and coordinator
                  will need to ensure that file with checksum &quot;abcd123&quot;
                  is assigned just to the node which has this file.</div>
              </blockquote>
              <div><br>
              </div>
              <div>With Docker/Kubernetes all nodes would have the same
                files. At least initially. Would be nice if we could
                come up with a solution where you can just drop an
                additional file onto any node and have it imported.</div>
            </div>
          </div>
        </div>
      </blockquote>
      Exactly, was thinking about Docker too. Here we don&#39;t have any
      issue at all.<br>
      <br>
      The main question here is, do we want to support the scenario when
      various cluster nodes have different content? As I mentioned,
      distributed coordinator can handle it, so that each cluster node
      will send the checksums of the files it has and coordinator will
      always assign to node just the checksums, which it has.<br>
      <br>
      However regardless of distributed executor approach or DB locks
      approach, there may be still the issues. For example:<br>
      1) The file testrealm.json with checksum &quot;abc&quot; is triggered for
      import on node1<br>
      2) At the same time, admin will do some minor change in this file
      on node2 and save it. This will mean that checksum of the file on
      node2 will be changed to &quot;def&quot;<br>
      3) Node2 will trigger import of that file. So we have both node1
      and node2 importing same file concurrently because the previously
      retrieved lock was for &quot;abc&quot; checksum, but now checksum is &quot;def&quot; <br>
      <br>
      This problem will be with both DB lock and DistributedExecutor
      approaches though...<br>
    </blockquote></div></div>
    Possible solution for this issue is, that when import is already in
    progress, the newly added or changed checksums will be ignored. The
    checksums will be always checked just at start of the import.<span class="HOEnZb"><font color="#888888"><br>
    <br>
    Marek</font></span><div><div class="h5"><br>
    <blockquote type="cite"> <br>
      Marek<br>
      <blockquote type="cite">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>
                      <br>
                      Marek</font></span>
                  <div>
                    <div><br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div> </div>
                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                <div text="#000000" bgcolor="#FFFFFF"><span><font color="#888888"><br>
                                      <br>
                                      Marek</font></span>
                                  <div>
                                    <div><br>
                                      <br>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div class="gmail_extra">
                                            <div class="gmail_quote">
                                              <div> </div>
                                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                <div dir="ltr"> <br>
                                                </div>
                                                <div class="gmail_extra"><br>
                                                  <div class="gmail_quote">
                                                    <div>
                                                      <div>On Mon, Nov
                                                        9, 2015 at 1:20
                                                        PM, Stian
                                                        Thorgersen <span dir="ltr">&lt;<a href="mailto:sthorger@redhat.com" target="_blank"></a><a href="mailto:sthorger@redhat.com" target="_blank">sthorger@redhat.com</a>&gt;</span>
                                                        wrote:<br>
                                                      </div>
                                                    </div>
                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                      <div>
                                                        <div>
                                                          <div dir="ltr">Currently

                                                          we support
                                                          importing a
                                                          complete realm
                                                          definition
                                                          using the
                                                          import/export
                                                          feature.
                                                          Issues with
                                                          the current
                                                          approach is:
                                                          <div><br>
                                                          </div>
                                                          <div>* Only
                                                          complete realm
                                                          - not possible
                                                          to add to an
                                                          existing realm</div>
                                                          <div>* No good
                                                          feedback if
                                                          import was
                                                          successful or
                                                          not</div>
                                                          <div>* Use of
                                                          system
                                                          properties to
                                                          initiate the
                                                          import is not
                                                          very user
                                                          friendly</div>
                                                          <div>* Not
                                                          very elegant
                                                          for
                                                          provisioning.
                                                          For example a
                                                          Docker image
                                                          that want&#39;s to
                                                          bundle some
                                                          initial setup
                                                          ends up always
                                                          running the
                                                          import of a
                                                          realm, which
                                                          is skipped if
                                                          realm exists</div>
                                                          <div><br>
                                                          </div>
                                                          <div>To solve
                                                          this I&#39;ve come
                                                          up with the
                                                          following
                                                          proposal:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>Allow
                                                          dropping
                                                          representations
                                                          to be imported
                                                          into
                                                          &#39;standalone/import&#39;.
                                                          This should
                                                          support
                                                          creating a new
                                                          realm as well
                                                          as importing
                                                          into an
                                                          existing
                                                          realm. When
                                                          importing into
                                                          an existing
                                                          realm we will
                                                          have an import
                                                          strategy that
                                                          is used to
                                                          configure what
                                                          happens if a
                                                          resource
                                                          exists (user,
                                                          role, identity
                                                          provider, user
                                                          federtation
                                                          provider). The
                                                          import
                                                          strategies
                                                          are:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>* Skip -
                                                          existing
                                                          resources are
                                                          skipped,</div>
                                                          <div>* Fail -
                                                          if any
                                                          resource
                                                          exists nothing
                                                          is imported</div>
                                                          <div>*
                                                          Overwrite -
                                                          any existing
                                                          resources are
                                                          deleted.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>The
                                                          directory will
                                                          be scanned at
                                                          startup, but
                                                          there will
                                                          also be an
                                                          option to
                                                          monitor this
                                                          directory at
                                                          runtime.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>To
                                                          prevent a file
                                                          being imported
                                                          multiple times
                                                          (also to make
                                                          sure only one
                                                          node in a
                                                          cluster
                                                          imports) we
                                                          will have a
                                                          table in the
                                                          database that
                                                          contains what
                                                          files was
                                                          imported, from
                                                          what node,
                                                          date and
                                                          result
                                                          (including a
                                                          list of what
                                                          resources
                                                          where
                                                          imported,
                                                          which was not,
                                                          and stack
                                                          trace if
                                                          applicable).
                                                          The primary
                                                          key will be
                                                          the checksum
                                                          of the file.
                                                          We will also
                                                          add marker
                                                          files
                                                          (&lt;json
                                                          file&gt;.imported
                                                          or &lt;json
                                                          file&gt;.failed).
                                                          The contents
                                                          of the marker
                                                          files will be
                                                          a json object
                                                          with date
                                                          imported,
                                                          outcome
                                                          (including
                                                          stack trace if
                                                          applicable) as
                                                          well as a
                                                          complete list
                                                          of what
                                                          resources was
                                                          successfully
                                                          imported, what
                                                          where not.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>The files
                                                          will also
                                                          allow
                                                          resolving
                                                          system
                                                          properties and
                                                          environment
                                                          variables. For
                                                          example:</div>
                                                          <div><br>
                                                          </div>
                                                          <div>{</div>
                                                          <div>   
                                                          &quot;secret&quot;:
                                                          &quot;${env.MYCLIENT_SECRET}&quot;</div>
                                                          <div>}</div>
                                                          <div><br>
                                                          </div>
                                                          <div>This will
                                                          be very
                                                          convenient for
                                                          example with
                                                          Docker as it
                                                          would be very
                                                          easy to create
                                                          a Docker image
                                                          that extends
                                                          ours to add a
                                                          few clients
                                                          and users.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>It will
                                                          also be
                                                          convenient for
                                                          examples as it
                                                          will make it
                                                          possible to
                                                          add the
                                                          required
                                                          clients and
                                                          users to an
                                                          existing
                                                          realm.</div>
                                                          <div><br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          </div>
                                                          <br>
                                                        </div>
                                                      </div>
_______________________________________________<br>
                                                      keycloak-dev
                                                      mailing list<br>
                                                      <a href="mailto:keycloak-dev@lists.jboss.org" target="_blank"></a><a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a><br>
                                                      <a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank"></a><a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a><br>
                                                    </blockquote>
                                                  </div>
                                                  <br>
                                                </div>
                                              </blockquote>
                                            </div>
                                            <br>
                                          </div>
                                        </div>
                                        <br>
                                        <fieldset></fieldset>
                                        <br>
                                        <pre>_______________________________________________
keycloak-dev mailing list
<a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
                                      </blockquote>
                                      <br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </blockquote>
      <br>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
keycloak-dev mailing list
<a href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>