Re: [keycloak-dev] Import proposal

Wednesday, 11 November 2015

On 11/11/15 09:01, Stian Thorgersen wrote:
...

 On 10 November 2015 at 16:11, Marek Posolda <mposolda(a)redhat.com 
 <mailto:mposolda@redhat.com>> wrote:

     On 09/11/15 14:09, Stian Thorgersen wrote:
>
>
>     On 9 November 2015 at 13:35, Sebastien Blanc <sblanc(a)redhat.com
>     <mailto:sblanc@redhat.com>> wrote:
>
>         That would be really nice indeed !
>         But are the markers files not enough, instead of also having
>         a table in the DB ?
>
>
>     We need a way to prevent multiple nodes in a cluster to import
>     the same file. For example on Kerberos you end up spinning up
>     multiple instances of the same Docker image.
     I bet you meant 'Kubernetes' :-)

 Yup

     +1 for the improvements. Besides those I think that earlier or
     later, we will need to solve long-running export+import where you
     want to import 100.000 users.

 +1

     As I mentioned in another mail few weeks ago, we can have:

     1) Table with the progress (51.000 users already imported, around
     49.000 remaining etc.)

 We would still need to split into multiple files in either case. 
 Having a single json file with 100K users is probably not going to 
 perform very well. So what I proposed would actually work for 
 long-running import as well. If each file has a manageable amount of 
 users (say ~5 min to import) then each file will be marked as imported 
 or failed. At least for now I don't think we should do smaller batches 
 than one file. As long as one file is imported within the same TX then 
 it's an all or nothing import.

     2) Concurrency and dividing the work among cluster nodes (Node1
     will import 50.000 users and node2 another 50.000 users)

 This would be solved as well. Each node picks up a file that's not 
 processed yet. Marks it in the DB and then gets to process it.

     3) Failover (Import won't be completely broken if cluster node
     crashes after import 90.000, but can continue on other cluster nodes)

     I think the stuff I did recently for pre-loading offline sessions
     at startup could be reused for this stuff too and it can handle
     (2) and (3) . Also it can handle parallel import triggered from
     more cluster nodes.

     For example: currently if you trigger kubernetes with 2 cluster
     nodes, both nodes will start to import same file at the same time
     because import triggered by node1 is not yet finished before node2
     is started, so there is not yet existing DB record that file is
     already imported. With the stuff I did, just the coordinator
     (node1) will start the import . Node2 will wait until import
     triggered by node1 is finished, but at the same time it can "help"
     to import some users (pages) if coordinator asks him to do so.
     This impl is based on infinispan distributed executor service

http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_...
     .

 The DB record needs to be created before a node tries to import it, 
 including a timestamp when it started the import. It should then be 
 updated once the import is completed, with the result. Using the 
 distributed execution framework sounds like a good idea though. How do 
 you prevent scheduling the same job multiple times? For example if all 
 nodes on startup scan the import folder and simply import everything 
 they find, then there will be multiple of the same job. Not really a 
 big deal as the first thing the job should do is check if there's a 
 record in the DB already. With distributed executor, it's the cluster
coordinator, which 
coordinates which node would import what. It will send messages to 
cluster nodes like "Hey, please import the file testrealm-users-3.json 
with timestamp abcd123" .

After node finishes the job, it notifies coordinator and coordinator 
will insert DB record and mark it as finished. So there is no DB record 
inserted before node starts import, because whole coordination is 
handled by the coordinator. Also there will never be same file imported 
more times by different cluster nodes.

Only exception would be if cluster node crashes before import is 
finished. Then it needs to be reimported by other cluster node, but 
that's the case with DB locks as well.

IMO the DB locks approach doesn't handle well crash of some cluster 
node. For example when node2 crashes unexpectedly when it's importing 
the file testrealm-users-3.json, the DB lock is held by this node, so 
other cluster nodes can't start on importing the file (until timeout 
occurs.)

On the other hand, distributed executor approach may have issues if 
there is inconsistent content of the standalone/import directory among 
cluster nodes. However it can be solved, so that each node will need to 
send checksums of the files it has and coordinator will need to ensure 
that file with checksum "abcd123" is assigned just to the node which has 
this file.

Marek
...

     Marek

>
>
>         On Mon, Nov 9, 2015 at 1:20 PM, Stian Thorgersen
>         <sthorger(a)redhat.com <mailto:sthorger@redhat.com>> wrote:
>
>             Currently we support importing a complete realm
>             definition using the import/export feature. Issues with
>             the current approach is:
>
>             * Only complete realm - not possible to add to an
>             existing realm
>             * No good feedback if import was successful or not
>             * Use of system properties to initiate the import is not
>             very user friendly
>             * Not very elegant for provisioning. For example a Docker
>             image that want's to bundle some initial setup ends up
>             always running the import of a realm, which is skipped if
>             realm exists
>
>             To solve this I've come up with the following proposal:
>
>             Allow dropping representations to be imported into
>             'standalone/import'. This should support creating a new
>             realm as well as importing into an existing realm. When
>             importing into an existing realm we will have an import
>             strategy that is used to configure what happens if a
>             resource exists (user, role, identity provider, user
>             federtation provider). The import strategies are:
>
>             * Skip - existing resources are skipped,
>             * Fail - if any resource exists nothing is imported
>             * Overwrite - any existing resources are deleted.
>
>             The directory will be scanned at startup, but there will
>             also be an option to monitor this directory at runtime.
>
>             To prevent a file being imported multiple times (also to
>             make sure only one node in a cluster imports) we will
>             have a table in the database that contains what files was
>             imported, from what node, date and result (including a
>             list of what resources where imported, which was not, and
>             stack trace if applicable). The primary key will be the
>             checksum of the file. We will also add marker files
>             (<json file>.imported or <json file>.failed). The
>             contents of the marker files will be a json object with
>             date imported, outcome (including stack trace if
>             applicable) as well as a complete list of what resources
>             was successfully imported, what where not.
>
>             The files will also allow resolving system properties and
>             environment variables. For example:
>
>             {
>                 "secret": "${env.MYCLIENT_SECRET}"
>             }
>
>             This will be very convenient for example with Docker as
>             it would be very easy to create a Docker image that
>             extends ours to add a few clients and users.
>
>             It will also be convenient for examples as it will make
>             it possible to add the required clients and users to an
>             existing realm.
>
>
>
>             _______________________________________________
>             keycloak-dev mailing list
>             keycloak-dev(a)lists.jboss.org
>             <mailto:keycloak-dev@lists.jboss.org>
>             https://lists.jboss.org/mailman/listinfo/keycloak-dev
>
>
>
>
>
>     _______________________________________________
>     keycloak-dev mailing list
>     keycloak-dev(a)lists.jboss.org <mailto:keycloak-dev@lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/keycloak-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [keycloak-dev] Import proposal