<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 11/11/15 15:36, Stian Thorgersen
wrote:<br>
</div>
<blockquote
cite="mid:CAJgngAeLDWnyE=OZ5k+0uZT9gV+jNvs=r_2WA4acXQhCEMxF1g@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 11 November 2015 at 15:23, Marek
Posolda <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="">
<div>On 11/11/15 09:01, Stian Thorgersen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 10 November 2015 at
16:11, Marek Posolda <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:mposolda@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:mposolda@redhat.com">mposolda@redhat.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<div>On 09/11/15 14:09, Stian Thorgersen
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9
November 2015 at 13:35,
Sebastien Blanc <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sblanc@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sblanc@redhat.com">sblanc@redhat.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">
<div>That would be really
nice indeed ! <br>
</div>
But are the markers files
not enough, instead of also
having a table in the DB ?<br>
</div>
</blockquote>
<div><br>
</div>
<div>We need a way to prevent
multiple nodes in a cluster to
import the same file. For
example on Kerberos you end up
spinning up multiple instances
of the same Docker image. <br>
</div>
</div>
</div>
</div>
</blockquote>
</span> I bet you meant 'Kubernetes' <span><span>
:-) </span></span></div>
</blockquote>
<div><br>
</div>
<div>Yup</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span><span>
</span></span><br>
<br>
+1 for the improvements. Besides those I
think that earlier or later, we will need
to solve long-running export+import where
you want to import 100.000 users. <br>
</div>
</blockquote>
<div><br>
</div>
<div>+1</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> <br>
As I mentioned in another mail few weeks
ago, we can have:<br>
<br>
1) Table with the progress (51.000 users
already imported, around 49.000 remaining
etc.)<br>
</div>
</blockquote>
<div><br>
</div>
<div>We would still need to split into
multiple files in either case. Having a
single json file with 100K users is probably
not going to perform very well. So what I
proposed would actually work for
long-running import as well. If each file
has a manageable amount of users (say ~5 min
to import) then each file will be marked as
imported or failed. At least for now I don't
think we should do smaller batches than one
file. As long as one file is imported within
the same TX then it's an all or nothing
import.</div>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> 2)
Concurrency and dividing the work among
cluster nodes (Node1 will import 50.000
users and node2 another 50.000 users)<br>
</div>
</blockquote>
<div><br>
</div>
<div>This would be solved as well. Each node
picks up a file that's not processed yet.
Marks it in the DB and then gets to process
it.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> 3)
Failover (Import won't be completely
broken if cluster node crashes after
import 90.000, but can continue on other
cluster nodes)<br>
<br>
I think the stuff I did recently for
pre-loading offline sessions at startup
could be reused for this stuff too and it
can handle (2) and (3) . Also it can
handle parallel import triggered from more
cluster nodes. <br>
<br>
For example: currently if you trigger
kubernetes with 2 cluster nodes, both
nodes will start to import same file at
the same time because import triggered by
node1 is not yet finished before node2 is
started, so there is not yet existing DB
record that file is already imported. With
the stuff I did, just the coordinator
(node1) will start the import . Node2 will
wait until import triggered by node1 is
finished, but at the same time it can
"help" to import some users (pages) if
coordinator asks him to do so. This impl
is based on infinispan distributed
executor service <a
moz-do-not-send="true"
href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework"
target="_blank"><a class="moz-txt-link-freetext" href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a></a>
.</div>
</blockquote>
<div><br>
</div>
<div>The DB record needs to be created before
a node tries to import it, including a
timestamp when it started the import. It
should then be updated once the import is
completed, with the result. Using the
distributed execution framework sounds like
a good idea though. How do you prevent
scheduling the same job multiple times? For
example if all nodes on startup scan the
import folder and simply import everything
they find, then there will be multiple of
the same job. Not really a big deal as the
first thing the job should do is check if
there's a record in the DB already.</div>
</div>
</div>
</div>
</blockquote>
</span> With distributed executor, it's the cluster
coordinator, which coordinates which node would import
what. It will send messages to cluster nodes like "Hey,
please import the file testrealm-users-3.json with
timestamp abcd123" . <br>
<br>
After node finishes the job, it notifies coordinator and
coordinator will insert DB record and mark it as
finished. So there is no DB record inserted before node
starts import, because whole coordination is handled by
the coordinator. Also there will never be same file
imported more times by different cluster nodes. <br>
<br>
Only exception would be if cluster node crashes before
import is finished. Then it needs to be reimported by
other cluster node, but that's the case with DB locks as
well.<br>
<br>
IMO the DB locks approach doesn't handle well crash of
some cluster node. For example when node2 crashes
unexpectedly when it's importing the file
testrealm-users-3.json, the DB lock is held by this
node, so other cluster nodes can't start on importing
the file (until timeout occurs.)<br>
<br>
On the other hand, distributed executor approach may
have issues if there is inconsistent content of the
standalone/import directory among cluster nodes. However
it can be solved, so that each node will need to send
checksums of the files it has and coordinator will need
to ensure that file with checksum "abcd123" is assigned
just to the node which has this file.</div>
</blockquote>
<div><br>
</div>
<div>With Docker/Kubernetes all nodes would have the same
files. At least initially. Would be nice if we could come
up with a solution where you can just drop an additional
file onto any node and have it imported.</div>
</div>
</div>
</div>
</blockquote>
Exactly, was thinking about Docker too. Here we don't have any issue
at all.<br>
<br>
The main question here is, do we want to support the scenario when
various cluster nodes have different content? As I mentioned,
distributed coordinator can handle it, so that each cluster node
will send the checksums of the files it has and coordinator will
always assign to node just the checksums, which it has.<br>
<br>
However regardless of distributed executor approach or DB locks
approach, there may be still the issues. For example:<br>
1) The file testrealm.json with checksum "abc" is triggered for
import on node1<br>
2) At the same time, admin will do some minor change in this file on
node2 and save it. This will mean that checksum of the file on node2
will be changed to "def"<br>
3) Node2 will trigger import of that file. So we have both node1 and
node2 importing same file concurrently because the previously
retrieved lock was for "abc" checksum, but now checksum is "def" <br>
<br>
This problem will be with both DB lock and DistributedExecutor
approaches though...<br>
<br>
Marek<br>
<blockquote
cite="mid:CAJgngAeLDWnyE=OZ5k+0uZT9gV+jNvs=r_2WA4acXQhCEMxF1g@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div class="h5"><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr"> <br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div>On Mon, Nov 9,
2015 at 1:20 PM,
Stian Thorgersen <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:sthorger@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sthorger@redhat.com">sthorger@redhat.com</a></a>></span>
wrote:<br>
</div>
</div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div>
<div>
<div dir="ltr">Currently
we support
importing a
complete realm
definition
using the
import/export
feature.
Issues with
the current
approach is:
<div><br>
</div>
<div>* Only
complete realm
- not possible
to add to an
existing realm</div>
<div>* No good
feedback if
import was
successful or
not</div>
<div>* Use of
system
properties to
initiate the
import is not
very user
friendly</div>
<div>* Not
very elegant
for
provisioning.
For example a
Docker image
that want's to
bundle some
initial setup
ends up always
running the
import of a
realm, which
is skipped if
realm exists</div>
<div><br>
</div>
<div>To solve
this I've come
up with the
following
proposal:</div>
<div><br>
</div>
<div>Allow
dropping
representations
to be imported
into
'standalone/import'.
This should
support
creating a new
realm as well
as importing
into an
existing
realm. When
importing into
an existing
realm we will
have an import
strategy that
is used to
configure what
happens if a
resource
exists (user,
role, identity
provider, user
federtation
provider). The
import
strategies
are:</div>
<div><br>
</div>
<div>* Skip -
existing
resources are
skipped,</div>
<div>* Fail -
if any
resource
exists nothing
is imported</div>
<div>*
Overwrite -
any existing
resources are
deleted.</div>
<div><br>
</div>
<div>The
directory will
be scanned at
startup, but
there will
also be an
option to
monitor this
directory at
runtime.</div>
<div><br>
</div>
<div>To
prevent a file
being imported
multiple times
(also to make
sure only one
node in a
cluster
imports) we
will have a
table in the
database that
contains what
files was
imported, from
what node,
date and
result
(including a
list of what
resources
where
imported,
which was not,
and stack
trace if
applicable).
The primary
key will be
the checksum
of the file.
We will also
add marker
files
(<json
file>.imported
or <json
file>.failed).
The contents
of the marker
files will be
a json object
with date
imported,
outcome
(including
stack trace if
applicable) as
well as a
complete list
of what
resources was
successfully
imported, what
where not.</div>
<div><br>
</div>
<div>The files
will also
allow
resolving
system
properties and
environment
variables. For
example:</div>
<div><br>
</div>
<div>{</div>
<div>
"secret":
"${env.MYCLIENT_SECRET}"</div>
<div>}</div>
<div><br>
</div>
<div>This will
be very
convenient for
example with
Docker as it
would be very
easy to create
a Docker image
that extends
ours to add a
few clients
and users.</div>
<div><br>
</div>
<div>It will
also be
convenient for
examples as it
will make it
possible to
add the
required
clients and
users to an
existing
realm.</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
</div>
</div>
_______________________________________________<br>
keycloak-dev mailing
list<br>
<a
moz-do-not-send="true"
href="mailto:keycloak-dev@lists.jboss.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:keycloak-dev@lists.jboss.org">keycloak-dev@lists.jboss.org</a></a><br>
<a
moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/keycloak-dev"
rel="noreferrer"
target="_blank"><a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
keycloak-dev mailing list
<a moz-do-not-send="true" href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>