<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Deleting the files should be good.<br>
<br>
One tricky thing are also dependencies among files. We need to
make sure that file "testrealm.json" with realm definition is
imported before users file "testrealm-users-X.json" . Also master
realm should be always imported first. We can solve this by divide
files into groups:<br>
1) File with master realm (if exists)<br>
2) Other files with realm definitions<br>
3) All user's files<br>
We need to ensure that import of each group will start after
previous group is fully finished. But this is solvable too.<br>
<br>
<br>
So far I can see this as my favourite workflow:<br>
<br>
a) When import is triggered, coordinator will ask all cluster
nodes for file checksums they have. Each node will check DB and
delete the files, which are already in DB (file with that checksum
was already imported in previous import iteration). Then node send
the checksums (just those which were not deleted) and to which
group each checksum belongs (master, realms, users)<br>
<br>
b) Import of group1 starts. Coordinator will elect the node, which
has the file with particular checksum available. For example if
checksum "abcd" is available on node1 and node3, any of those 2
nodes are elected.<br>
<br>
c) Once import of file "abcd" is successful, node will save the
record "abcd - IMPORT SUCCESS" to DB and send message to
coordinator about finished import.<br>
<br>
d) If node1 crashes, the import of "abcd" file is immediatelly
re-triggered by coordinator on node3. Similarly if file with
checksum "abcd" is not available on node1 (this can happen if
admin edited the file on node1 and saved it. The checksum of the
file will be changed in this case), the import is re-triggered on
node3 too.<br>
<br>
e) Once group1 (master realm) is finished, go back to step (b) and
start with group2 (other realms), then again with group3 (user's
files)<br>
<br>
f) The checksums are collected just at startup (step a). If
someone change the file on any node during import iteration, it
will be new checksum and this one will be ignored until this
import iteration is finished. <br>
<br>
g) If new node joins the cluster, it can start helping with import
as long as he has the files with checksums collected at step (a).<br>
<br>
At the end of the import iteration, there are DB records of all
successfully imported checksums. When new import iteration is
triggered, the nodes will delete the files already imported (we
can also do that at the end of the import iteration too)<br>
<br>
Maybe it sounds a bit complicated, but I can see this as most
performant workflow and vulnerable to crashes of any cluster node.
Also it's vulnerable to the situation when admin changed some file
when import is in progress.<br>
<br>
Marek<br>
<br>
<br>
On 11/11/15 16:01, Stian Thorgersen wrote:<br>
</div>
<blockquote
cite="mid:CAJgngAfN66Lt1h=OSPfeYsg=zDhxGXMb8wS_PTA_FVxv9QFJYA@mail.gmail.com"
type="cite">
<div dir="ltr">What about if we remove files as they are being
imported?<br>
<div><br>
</div>
<div>Something like:</div>
<div><br>
</div>
<div>* When we detect a new file on a node, we send the name +
checksum to the cordinator</div>
<div>* The cordinator then checks if this file has already been
imported</div>
<div>* If it's imported it sends out a message stating that file
with name + checksum is already imported and all nodes delete
this file</div>
<div>* If it's not imported it picks one node that is
responsible to import the file (that would be the first node
that sends the message about the file+checksum). This node
will rename the file to .importing</div>
<div>* Once the file has been imported it's renamed to .imported
or .failed</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 11 November 2015 at 15:53, Marek
Posolda <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:mposolda@redhat.com" target="_blank">mposolda@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>
<div class="h5">
<div>On 11/11/15 15:51, Marek Posolda wrote:<br>
</div>
<blockquote type="cite">
<div>On 11/11/15 15:36, Stian Thorgersen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 11 November 2015
at 15:23, Marek Posolda <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:mposolda@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:mposolda@redhat.com">mposolda@redhat.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<div>On 11/11/15 09:01, Stian
Thorgersen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 10
November 2015 at 16:11, Marek
Posolda <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:mposolda@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:mposolda@redhat.com">mposolda@redhat.com</a></a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"><span>
<div>On 09/11/15 14:09,
Stian Thorgersen
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
9 November 2015
at 13:35,
Sebastien Blanc
<span dir="ltr"><<a
moz-do-not-send="true" href="mailto:sblanc@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sblanc@redhat.com">sblanc@redhat.com</a></a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr">
<div>That
would be
really nice
indeed ! <br>
</div>
But are the
markers files
not enough,
instead of
also having a
table in the
DB ?<br>
</div>
</blockquote>
<div><br>
</div>
<div>We need a
way to prevent
multiple nodes
in a cluster
to import the
same file. For
example on
Kerberos you
end up
spinning up
multiple
instances of
the same
Docker image.
<br>
</div>
</div>
</div>
</div>
</blockquote>
</span> I bet you meant
'Kubernetes' <span><span>
:-) </span></span></div>
</blockquote>
<div><br>
</div>
<div>Yup</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"><span><span>
</span></span><br>
<br>
+1 for the improvements.
Besides those I think that
earlier or later, we will
need to solve long-running
export+import where you
want to import 100.000
users. <br>
</div>
</blockquote>
<div><br>
</div>
<div>+1</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"> <br>
As I mentioned in another
mail few weeks ago, we can
have:<br>
<br>
1) Table with the progress
(51.000 users already
imported, around 49.000
remaining etc.)<br>
</div>
</blockquote>
<div><br>
</div>
<div>We would still need to
split into multiple files in
either case. Having a single
json file with 100K users is
probably not going to
perform very well. So what I
proposed would actually work
for long-running import as
well. If each file has a
manageable amount of users
(say ~5 min to import) then
each file will be marked as
imported or failed. At least
for now I don't think we
should do smaller batches
than one file. As long as
one file is imported within
the same TX then it's an all
or nothing import.</div>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"> 2)
Concurrency and dividing
the work among cluster
nodes (Node1 will import
50.000 users and node2
another 50.000 users)<br>
</div>
</blockquote>
<div><br>
</div>
<div>This would be solved as
well. Each node picks up a
file that's not processed
yet. Marks it in the DB and
then gets to process it.</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"> 3)
Failover (Import won't be
completely broken if
cluster node crashes after
import 90.000, but can
continue on other cluster
nodes)<br>
<br>
I think the stuff I did
recently for pre-loading
offline sessions at
startup could be reused
for this stuff too and it
can handle (2) and (3) .
Also it can handle
parallel import triggered
from more cluster nodes. <br>
<br>
For example: currently if
you trigger kubernetes
with 2 cluster nodes, both
nodes will start to import
same file at the same time
because import triggered
by node1 is not yet
finished before node2 is
started, so there is not
yet existing DB record
that file is already
imported. With the stuff I
did, just the coordinator
(node1) will start the
import . Node2 will wait
until import triggered by
node1 is finished, but at
the same time it can
"help" to import some
users (pages) if
coordinator asks him to do
so. This impl is based on
infinispan distributed
executor service <a
moz-do-not-send="true"
href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework"
target="_blank"><a class="moz-txt-link-freetext" href="http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework">http://infinispan.org/docs/5.3.x/user_guide/user_guide.html#_distributed_execution_framework</a></a>
.</div>
</blockquote>
<div><br>
</div>
<div>The DB record needs to be
created before a node tries
to import it, including a
timestamp when it started
the import. It should then
be updated once the import
is completed, with the
result. Using the
distributed execution
framework sounds like a good
idea though. How do you
prevent scheduling the same
job multiple times? For
example if all nodes on
startup scan the import
folder and simply import
everything they find, then
there will be multiple of
the same job. Not really a
big deal as the first thing
the job should do is check
if there's a record in the
DB already.</div>
</div>
</div>
</div>
</blockquote>
</span> With distributed executor, it's
the cluster coordinator, which
coordinates which node would import
what. It will send messages to cluster
nodes like "Hey, please import the file
testrealm-users-3.json with timestamp
abcd123" . <br>
<br>
After node finishes the job, it notifies
coordinator and coordinator will insert
DB record and mark it as finished. So
there is no DB record inserted before
node starts import, because whole
coordination is handled by the
coordinator. Also there will never be
same file imported more times by
different cluster nodes. <br>
<br>
Only exception would be if cluster node
crashes before import is finished. Then
it needs to be reimported by other
cluster node, but that's the case with
DB locks as well.<br>
<br>
IMO the DB locks approach doesn't handle
well crash of some cluster node. For
example when node2 crashes unexpectedly
when it's importing the file
testrealm-users-3.json, the DB lock is
held by this node, so other cluster
nodes can't start on importing the file
(until timeout occurs.)<br>
<br>
On the other hand, distributed executor
approach may have issues if there is
inconsistent content of the
standalone/import directory among
cluster nodes. However it can be solved,
so that each node will need to send
checksums of the files it has and
coordinator will need to ensure that
file with checksum "abcd123" is assigned
just to the node which has this file.</div>
</blockquote>
<div><br>
</div>
<div>With Docker/Kubernetes all nodes would
have the same files. At least initially.
Would be nice if we could come up with a
solution where you can just drop an
additional file onto any node and have it
imported.</div>
</div>
</div>
</div>
</blockquote>
Exactly, was thinking about Docker too. Here we
don't have any issue at all.<br>
<br>
The main question here is, do we want to support the
scenario when various cluster nodes have different
content? As I mentioned, distributed coordinator can
handle it, so that each cluster node will send the
checksums of the files it has and coordinator will
always assign to node just the checksums, which it
has.<br>
<br>
However regardless of distributed executor approach
or DB locks approach, there may be still the issues.
For example:<br>
1) The file testrealm.json with checksum "abc" is
triggered for import on node1<br>
2) At the same time, admin will do some minor change
in this file on node2 and save it. This will mean
that checksum of the file on node2 will be changed
to "def"<br>
3) Node2 will trigger import of that file. So we
have both node1 and node2 importing same file
concurrently because the previously retrieved lock
was for "abc" checksum, but now checksum is "def" <br>
<br>
This problem will be with both DB lock and
DistributedExecutor approaches though...<br>
</blockquote>
</div>
</div>
Possible solution for this issue is, that when import is
already in progress, the newly added or changed checksums
will be ignored. The checksums will be always checked just
at start of the import.<span class="HOEnZb"><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div class="h5"><br>
<blockquote type="cite"> <br>
Marek<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"><span><font
color="#888888"><br>
<br>
Marek</font></span>
<div>
<div><br>
<br>
<blockquote
type="cite">
<div dir="ltr">
<div
class="gmail_extra">
<div
class="gmail_quote">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr"> <br>
</div>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">
<div>
<div>On Mon,
Nov 9, 2015 at
1:20 PM, Stian
Thorgersen <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:sthorger@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:sthorger@redhat.com">sthorger@redhat.com</a></a>></span>
wrote:<br>
</div>
</div>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div>
<div>
<div dir="ltr">Currently
we support
importing a
complete realm
definition
using the
import/export
feature.
Issues with
the current
approach is:
<div><br>
</div>
<div>* Only
complete realm
- not possible
to add to an
existing realm</div>
<div>* No good
feedback if
import was
successful or
not</div>
<div>* Use of
system
properties to
initiate the
import is not
very user
friendly</div>
<div>* Not
very elegant
for
provisioning.
For example a
Docker image
that want's to
bundle some
initial setup
ends up always
running the
import of a
realm, which
is skipped if
realm exists</div>
<div><br>
</div>
<div>To solve
this I've come
up with the
following
proposal:</div>
<div><br>
</div>
<div>Allow
dropping
representations
to be imported
into
'standalone/import'.
This should
support
creating a new
realm as well
as importing
into an
existing
realm. When
importing into
an existing
realm we will
have an import
strategy that
is used to
configure what
happens if a
resource
exists (user,
role, identity
provider, user
federtation
provider). The
import
strategies
are:</div>
<div><br>
</div>
<div>* Skip -
existing
resources are
skipped,</div>
<div>* Fail -
if any
resource
exists nothing
is imported</div>
<div>*
Overwrite -
any existing
resources are
deleted.</div>
<div><br>
</div>
<div>The
directory will
be scanned at
startup, but
there will
also be an
option to
monitor this
directory at
runtime.</div>
<div><br>
</div>
<div>To
prevent a file
being imported
multiple times
(also to make
sure only one
node in a
cluster
imports) we
will have a
table in the
database that
contains what
files was
imported, from
what node,
date and
result
(including a
list of what
resources
where
imported,
which was not,
and stack
trace if
applicable).
The primary
key will be
the checksum
of the file.
We will also
add marker
files
(<json
file>.imported
or <json
file>.failed).
The contents
of the marker
files will be
a json object
with date
imported,
outcome
(including
stack trace if
applicable) as
well as a
complete list
of what
resources was
successfully
imported, what
where not.</div>
<div><br>
</div>
<div>The files
will also
allow
resolving
system
properties and
environment
variables. For
example:</div>
<div><br>
</div>
<div>{</div>
<div>
"secret":
"${env.MYCLIENT_SECRET}"</div>
<div>}</div>
<div><br>
</div>
<div>This will
be very
convenient for
example with
Docker as it
would be very
easy to create
a Docker image
that extends
ours to add a
few clients
and users.</div>
<div><br>
</div>
<div>It will
also be
convenient for
examples as it
will make it
possible to
add the
required
clients and
users to an
existing
realm.</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
</div>
</div>
_______________________________________________<br>
keycloak-dev
mailing list<br>
<a
moz-do-not-send="true"
href="mailto:keycloak-dev@lists.jboss.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:keycloak-dev@lists.jboss.org">keycloak-dev@lists.jboss.org</a></a><br>
<a
moz-do-not-send="true"
href="https://lists.jboss.org/mailman/listinfo/keycloak-dev"
target="_blank"><a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
keycloak-dev mailing list
<a moz-do-not-send="true" href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
keycloak-dev mailing list
<a moz-do-not-send="true" href="mailto:keycloak-dev@lists.jboss.org" target="_blank">keycloak-dev@lists.jboss.org</a>
<a moz-do-not-send="true" href="https://lists.jboss.org/mailman/listinfo/keycloak-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/keycloak-dev</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>