<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 11/11/15 09:01, Stian Thorgersen
wrote:<br>
</div>
<blockquote
cite="mid:CAJgngAfoKGxKMYh65SAM1VQ3ZWjY-mUCy_vr9xeMh-=a0nV4zg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 10 November 2015 at 16:11, Marek
Posolda <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="">
<div>On 09/11/15 14:09, Stian Thorgersen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9 November 2015 at
13:35, Sebastien Blanc <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sblanc@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sblanc@redhat.com">sblanc@redhat.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div>That would be really nice indeed ! <br>
</div>
But are the markers files not enough,
instead of also having a table in the DB ?<br>
</div>
</blockquote>
<div><br>
</div>
<div>We need a way to prevent multiple nodes
in a cluster to import the same file. For
example on Kerberos you end up spinning up
multiple instances of the same Docker image.
<br>
</div>
</div>
</div>
</div>
</blockquote>
</span> I bet you meant 'Kubernetes' <span><span> :-)
</span></span></div>
</blockquote>
<div><br>
</div>
<div>Yup</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span><span> </span></span><br>
<br>
+1 for the improvements. Besides those I think that
earlier or later, we will need to solve long-running
export+import where you want to import 100.000 users. <br>
</div>
</blockquote>
<div><br>
</div>
<div>+1</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> <br>
As I mentioned in another mail few weeks ago, we can
have:<br>
<br>
1) Table with the progress (51.000 users already
imported, around 49.000 remaining etc.)<br>
</div>
</blockquote>
<div><br>
</div>
<div>We would still need to split into multiple files in
either case. Having a single json file with 100K users is
probably not going to perform very well. So what I
proposed would actually work for long-running import as
well. If each file has a manageable amount of users (say
~5 min to import) then each file will be marked as
imported or failed. At least for now I don't think we
should do smaller batches than one file. As long as one
file is imported within the same TX then it's an all or
nothing import.</div>
</div>
</div>
</div>
</blockquote>
<blockquote
cite="mid:CAJgngAfoKGxKMYh65SAM1VQ3ZWjY-mUCy_vr9xeMh-=a0nV4zg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> 2) Concurrency and
dividing the work among cluster nodes (Node1 will import
50.000 users and node2 another 50.000 users)<br>
</div>
</blockquote>
<div><br>
</div>
<div>This would be solved as well. Each node picks up a file
that's not processed yet. Marks it in the DB and then gets
to process it.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> 3) Failover (Import
won't be completely broken if cluster node crashes after
import 90.000, but can continue on other cluster nodes)<br>
<br>
I think the stuff I did recently for pre-loading offline
sessions at startup could be reused for this stuff too
and it can handle (2) and (3) . Also it can handle
parallel import triggered from more cluster nodes. <br>
<br>
For example: currently if you trigger kubernetes with 2
cluster nodes, both nodes will start to import same file
at the same time because import triggered by node1 is
not yet finished before node2 is started, so there is
not yet existing DB record that file is already
imported. With the stuff I did, just the coordinator
(node1) will start the import . Node2 will wait until
import triggered by node1 is finished, but at the same
time it can "help" to import some users (pages) if
coordinator asks him to do so. This impl is based on
infinispan distributed executor service <a
moz-do-not-send="true"
href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework"
target="_blank"><a class="moz-txt-link-freetext" href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a></a>
.</div>
</blockquote>
<div><br>
</div>
<div>The DB record needs to be created before a node tries
to import it, including a timestamp when it started the
import. It should then be updated once the import is
completed, with the result. Using the distributed
execution framework sounds like a good idea though. How do
you prevent scheduling the same job multiple times? For
example if all nodes on startup scan the import folder and
simply import everything they find, then there will be
multiple of the same job. Not really a big deal as the
first thing the job should do is check if there's a record
in the DB already.</div>
</div>
</div>
</div>
</blockquote>
With distributed executor, it's the cluster coordinator, which
coordinates which node would import what. It will send messages to
cluster nodes like "Hey, please import the file
testrealm-users-3.json with timestamp abcd123" . <br>
<br>
After node finishes the job, it notifies coordinator and coordinator
will insert DB record and mark it as finished. So there is no DB
record inserted before node starts import, because whole
coordination is handled by the coordinator. Also there will never be
same file imported more times by different cluster nodes. <br>
<br>
Only exception would be if cluster node crashes before import is
finished. Then it needs to be reimported by other cluster node, but
that's the case with DB locks as well.<br>
<br>
IMO the DB locks approach doesn't handle well crash of some cluster
node. For example when node2 crashes unexpectedly when it's
importing the file testrealm-users-3.json, the DB lock is held by
this node, so other cluster nodes can't start on importing the file
(until timeout occurs.)<br>
<br>
On the other hand, distributed executor approach may have issues if
there is inconsistent content of the standalone/import directory
among cluster nodes. However it can be solved, so that each node
will need to send checksums of the files it has and coordinator will
need to ensure that file with checksum "abcd123" is assigned just to
the node which has this file.<br>
<br>
Marek<br>
<blockquote
cite="mid:CAJgngAfoKGxKMYh65SAM1VQ3ZWjY-mUCy_vr9xeMh-=a0nV4zg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div class="h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr"> <br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div>On Mon, Nov 9, 2015 at 1:20 PM,
Stian Thorgersen <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sthorger@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sthorger@redhat.com">sthorger@redhat.com</a></a>></span>
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div>
<div>
<div dir="ltr">Currently we
support importing a complete
realm definition using the
import/export feature. Issues
with the current approach is:
<div><br>
</div>
<div>* Only complete realm -
not possible to add to an
existing realm</div>
<div>* No good feedback if
import was successful or not</div>
<div>* Use of system
properties to initiate the
import is not very user
friendly</div>
<div>* Not very elegant for
provisioning. For example a
Docker image that want's to
bundle some initial setup
ends up always running the
import of a realm, which is
skipped if realm exists</div>
<div><br>
</div>
<div>To solve this I've come
up with the following
proposal:</div>
<div><br>
</div>
<div>Allow dropping
representations to be
imported into
'standalone/import'. This
should support creating a
new realm as well as
importing into an existing
realm. When importing into
an existing realm we will
have an import strategy that
is used to configure what
happens if a resource exists
(user, role, identity
provider, user federtation
provider). The import
strategies are:</div>
<div><br>
</div>
<div>* Skip - existing
resources are skipped,</div>
<div>* Fail - if any resource
exists nothing is imported</div>
<div>* Overwrite - any
existing resources are
deleted.</div>
<div><br>
</div>
<div>The directory will be
scanned at startup, but
there will also be an option
to monitor this directory at
runtime.</div>
<div><br>
</div>
<div>To prevent a file being
imported multiple times
(also to make sure only one
node in a cluster imports)
we will have a table in the
database that contains what
files was imported, from
what node, date and result
(including a list of what
resources where imported,
which was not, and stack
trace if applicable). The
primary key will be the
checksum of the file. We
will also add marker files
(<json file>.imported
or <json
file>.failed). The
contents of the marker files
will be a json object with
date imported, outcome
(including stack trace if
applicable) as well as a
complete list of what
resources was successfully
imported, what where not.</div>
<div><br>
</div>
<div>The files will also allow
resolving system properties
and environment variables.
For example:</div>
<div><br>
</div>
<div>{</div>
<div> "secret":
"${env.MYCLIENT_SECRET}"</div>
<div>}</div>
<div><br>
</div>
<div>This will be very
convenient for example with
Docker as it would be very
easy to create a Docker
image that extends ours to
add a few clients and users.</div>
<div><br>
</div>
<div>It will also be
convenient for examples as
it will make it possible to
add the required clients and
users to an existing realm.</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
</div>
</div>
_______________________________________________<br>
keycloak-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:keycloak-dev@lists.jboss.org"
target="_blank">keycloak-dev@lists.jboss.org</a><br>
<a moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/keycloak-dev"
rel="noreferrer" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
keycloak-dev mailing list
<a moz-do-not-send="true" href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>