[keycloak-dev] Import proposal

Stian Thorgersen sthorger at redhat.com
Wed Nov 11 09:36:09 EST 2015


On 11 November 2015 at 15:23, Marek Posolda <mposolda at redhat.com> wrote:

> On 11/11/15 09:01, Stian Thorgersen wrote:
>
>
>
> On 10 November 2015 at 16:11, Marek Posolda <mposolda at redhat.com> wrote:
>
>> On 09/11/15 14:09, Stian Thorgersen wrote:
>>
>>
>>
>> On 9 November 2015 at 13:35, Sebastien Blanc < <sblanc at redhat.com>
>> sblanc at redhat.com> wrote:
>>
>>> That would be really nice indeed !
>>> But are the markers files not enough, instead of also having a table in
>>> the DB ?
>>>
>>
>> We need a way to prevent multiple nodes in a cluster to import the same
>> file. For example on Kerberos you end up spinning up multiple instances of
>> the same Docker image.
>>
>> I bet you meant 'Kubernetes' :-)
>>
>
> Yup
>
>
>>
>>
>> +1 for the improvements. Besides those I think that earlier or later, we
>> will need to solve long-running export+import where you want to import
>> 100.000 users.
>>
>
> +1
>
>
>>
>> As I mentioned in another mail few weeks ago, we can have:
>>
>> 1) Table with the progress (51.000 users already imported, around 49.000
>> remaining etc.)
>>
>
> We would still need to split into multiple files in either case. Having a
> single json file with 100K users is probably not going to perform very
> well. So what I proposed would actually work for long-running import as
> well. If each file has a manageable amount of users (say ~5 min to import)
> then each file will be marked as imported or failed. At least for now I
> don't think we should do smaller batches than one file. As long as one file
> is imported within the same TX then it's an all or nothing import.
>
>
>
>> 2) Concurrency and dividing the work among cluster nodes (Node1 will
>> import 50.000 users and node2 another 50.000 users)
>>
>
> This would be solved as well. Each node picks up a file that's not
> processed yet. Marks it in the DB and then gets to process it.
>
>
>> 3) Failover (Import won't be completely broken if cluster node crashes
>> after import 90.000, but can continue on other cluster nodes)
>>
>> I think the stuff I did recently for pre-loading offline sessions at
>> startup could be reused for this stuff too and it can handle (2) and (3) .
>> Also it can handle parallel import triggered from more cluster nodes.
>>
>> For example: currently if you trigger kubernetes with 2 cluster nodes,
>> both nodes will start to import same file at the same time because import
>> triggered by node1 is not yet finished before node2 is started, so there is
>> not yet existing DB record that file is already imported. With the stuff I
>> did, just the coordinator (node1) will start the import . Node2 will wait
>> until import triggered by node1 is finished, but at the same time it can
>> "help" to import some users (pages) if coordinator asks him to do so. This
>> impl is based on infinispan distributed executor service
>> <http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework>
>> http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework
>> .
>>
>
> The DB record needs to be created before a node tries to import it,
> including a timestamp when it started the import. It should then be updated
> once the import is completed, with the result. Using the distributed
> execution framework sounds like a good idea though. How do you prevent
> scheduling the same job multiple times? For example if all nodes on startup
> scan the import folder and simply import everything they find, then there
> will be multiple of the same job. Not really a big deal as the first thing
> the job should do is check if there's a record in the DB already.
>
> With distributed executor, it's the cluster coordinator, which coordinates
> which node would import what. It will send messages to cluster nodes like
> "Hey, please import the file testrealm-users-3.json with timestamp abcd123"
> .
>
> After node finishes the job, it notifies coordinator and coordinator will
> insert DB record and mark it as finished. So there is no DB record inserted
> before node starts import, because whole coordination is handled by the
> coordinator. Also there will never be same file imported more times by
> different cluster nodes.
>
> Only exception would be if cluster node crashes before import is finished.
> Then it needs to be reimported by other cluster node, but that's the case
> with DB locks as well.
>
> IMO the DB locks approach doesn't handle well crash of some cluster node.
> For example when node2 crashes unexpectedly when it's importing the file
> testrealm-users-3.json, the DB lock is held by this node, so other cluster
> nodes can't start on importing the file (until timeout occurs.)
>
> On the other hand, distributed executor approach may have issues if there
> is inconsistent content of the standalone/import directory among cluster
> nodes. However it can be solved, so that each node will need to send
> checksums of the files it has and coordinator will need to ensure that file
> with checksum "abcd123" is assigned just to the node which has this file.
>

With Docker/Kubernetes all nodes would have the same files. At least
initially. Would be nice if we could come up with a solution where you can
just drop an additional file onto any node and have it imported.


>
>
> Marek
>
>
>
>>
>>
>> Marek
>>
>>
>>
>>
>>>
>>>
>>> On Mon, Nov 9, 2015 at 1:20 PM, Stian Thorgersen < <sthorger at redhat.com>
>>> sthorger at redhat.com> wrote:
>>>
>>>> Currently we support importing a complete realm definition using the
>>>> import/export feature. Issues with the current approach is:
>>>>
>>>> * Only complete realm - not possible to add to an existing realm
>>>> * No good feedback if import was successful or not
>>>> * Use of system properties to initiate the import is not very user
>>>> friendly
>>>> * Not very elegant for provisioning. For example a Docker image that
>>>> want's to bundle some initial setup ends up always running the import of a
>>>> realm, which is skipped if realm exists
>>>>
>>>> To solve this I've come up with the following proposal:
>>>>
>>>> Allow dropping representations to be imported into 'standalone/import'.
>>>> This should support creating a new realm as well as importing into an
>>>> existing realm. When importing into an existing realm we will have an
>>>> import strategy that is used to configure what happens if a resource exists
>>>> (user, role, identity provider, user federtation provider). The import
>>>> strategies are:
>>>>
>>>> * Skip - existing resources are skipped,
>>>> * Fail - if any resource exists nothing is imported
>>>> * Overwrite - any existing resources are deleted.
>>>>
>>>> The directory will be scanned at startup, but there will also be an
>>>> option to monitor this directory at runtime.
>>>>
>>>> To prevent a file being imported multiple times (also to make sure only
>>>> one node in a cluster imports) we will have a table in the database that
>>>> contains what files was imported, from what node, date and result
>>>> (including a list of what resources where imported, which was not, and
>>>> stack trace if applicable). The primary key will be the checksum of the
>>>> file. We will also add marker files (<json file>.imported or <json
>>>> file>.failed). The contents of the marker files will be a json object with
>>>> date imported, outcome (including stack trace if applicable) as well as a
>>>> complete list of what resources was successfully imported, what where not.
>>>>
>>>> The files will also allow resolving system properties and environment
>>>> variables. For example:
>>>>
>>>> {
>>>>     "secret": "${env.MYCLIENT_SECRET}"
>>>> }
>>>>
>>>> This will be very convenient for example with Docker as it would be
>>>> very easy to create a Docker image that extends ours to add a few clients
>>>> and users.
>>>>
>>>> It will also be convenient for examples as it will make it possible to
>>>> add the required clients and users to an existing realm.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> keycloak-dev mailing list
>>>> keycloak-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> keycloak-dev mailing listkeycloak-dev at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/keycloak-dev
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/keycloak-dev/attachments/20151111/7c8b9a88/attachment-0001.html 


More information about the keycloak-dev mailing list