[keycloak-dev] Clustering configuration

Thu Sep 13 08:50:15 EDT 2018

Sorry something got wrong and I forgot to include the dev mailing list.

We've just had a discussion on this topic with @Stian Thorgersen
<stian at redhat.com>. Here's what we agreed upon:
- We want to have an extensible mechanism for adding new discovery
protocols for different environments. We plan to implement something
similar to what we have for DBs.
- We will be able to switch into different configuration options using
environment variables.
- For the OpenShift use case, we'll focus on DNS_PING.
- Later on, we'll ask for community help especially around JDBC_PING.

Let me look at the code and implement a proposal. I'll ask you guys for a
review once I have a working PR.

Thanks again to all of you for a very useful info!

On Thu, Sep 13, 2018 at 10:21 AM Stian Thorgersen <sthorger at redhat.com>
wrote:

>
>
> On Thu, 13 Sep 2018 at 09:38, Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>> Answering a several email in a row. Please forgive me for not responding
>> each one individually.
>>
>> Keycloak is actually a web app (rather than a separate app as Infiispan,
>> which is more forked of Wildfly rather than layered). We already use a
>> database that needs to be somewhere there (either embedded H2, MySQL or
>> whatever). Moreover, we also define our data source - KeycloakDS. I also
>> expect Keycloak clusters to be rather small, just 2-3 servers for HA.
>>
>> Having all this in mind, using JDBC_PING might not be a bad idea, I
>> think. I guess we are lucky since we can assume that we already have an SQL
>> database running. @Bela Ban <bban at redhat.com> I've also seen you fixed
>> the crashed members issue mentioned in [1] in 4.0.10. We're on 4.0.11, so
>> we should also be good to go.
>> However there are a few limitations that are worth mentioning and I would
>> like to request some comments from @Marek Posolda <mposolda at redhat.com>, @Stian
>> Thorgersen <stian at redhat.com> and @Hynek Mlnarik <hmlnarik at redhat.com>:
>> - This makes our dependency to the database even stronger. If we ever
>> decided to change the storage, this issue will get back to us like a
>> boomerang.
>> - When considering cross site replication (with database replication
>> turned on), I guess we will need a separate, not-replicated table for
>> discovery per each site. Otherwise, sites might override each other by
>> writing into the same row in the same table.
>> - We have a few data base implementation that we can work with (H2,
>> MySQL, PostgreSQL and others but that requires additional drivers). The
>> JDBC_PING uses SQL queries so we would need to write them in such a way,
>> that they always work regardless to the DB implementation. Since this is at
>> the configuration level, we can not use JPA. Those will need to be
>> hand-crafted queries. But I guess it shouldn't be a big deal.
>>
>
> I'm not keen on JDBC_PING. I know it simple and works, but testing it on
> all supported databases will be difficult. Then if we do decide to get rid
> of the DB as a strong requirement it would come back to haunt us, but I
> think that's something we could address if/when.
>
> When we're talking about testing. We won't be able to do exhaustive
> testing here on Docker, OpenShift, Kubernetes, GCE, EC2, ... So we would
> have to rely on the community to test and provide us with feedback.
>
>
>>
>> The MULTI_PING also seems very interesting. Thanks a lot for all the
>> useful info! @Paul Ferraro <paul.ferraro at redhat.com>, could you please
>> tell me when do you plan to implement the multi discovery protocol behavior
>> you mentioned in your email? In which Wildfly version this feature will be
>> available?
>>
>> Thanks,
>> Sebastian
>>
>> PS - If you guys spot any problems with JDBC_PING and current version of
>> Keycloak, please create a JIRA for us -
>> https://issues.jboss.org/browse/KEYCLOAK
>>
>> [1] https://issues.jboss.org/browse/JGRP-2245
>>
>> On Wed, Sep 12, 2018 at 5:10 PM Bela Ban <bban at redhat.com> wrote:
>>
>>>
>>>
>>> On 12/09/18 14:46, Paul Ferraro wrote:
>>> > Hi Sebastian,
>>> >
>>> > We've planned to support MULTI_PING in WF by adding it to the stack
>>> > automatically in the event that a stack contains multiple discovery
>>> > protocols [1].  We want to keep this hidden from users, since JGroups
>>> > plans to support multiple discovery protocols transparently [2].
>>>
>>> As I said before, I don't like to add hidden stuff, or else
>>> configuration != runtime. However, in this specific case, I agree that
>>> this may be a good interim solution until we have multiple discovery
>>> protocols without the need for MULTI_PING.
>>>
>>> > Since we haven't had time to do this yet, we have not yet performed
>>> > any testing of MULTI_PING, nor do I know of any users already using
>>> > this.
>>>
>>> Correct. I haven't heard of any users, either.
>>>
>>> Next steps wrt discovery are me thinking about multiple discovery
>>> protocols without MULTI_PING and perhaps a more reactive design
>>> (FIND_MBRS_ASYNC) involving CompletableFutures, as suggested by Dan.
>>>
>>> > If you don't have the bandwidth to toy with MULTI_PING, I would
>>> > strongly encourage you to use option #3.  Configuration XML is
>>> > permitted to change dramatically between schema versions, and can be a
>>> > nightmare for XML manipulation scripts, while the EAP/WF management
>>> > CLI is effectively a stable API, where backwards compatibility (for a
>>> > specific version range of the model) is a requirement.  Rado recently
>>> > refactored our testsuite to uses CLI scripts for modifying
>>> > configuration (instead of XML manipulation) and it has considerably
>>> > reduced the complexity of our testsuite.  I can't imagine ever going
>>> > back.
>>> >
>>> > CLI scripts for WF/EAP's jgroups subsystem would look considerably
>>> > simpler than the scripts you posted above (for the datagrid-jgroups
>>> > subsystem).  Specifically:
>>> > 1. Rather than remove and redefine an entire protocol stack, you can
>>> > remove a specific protocol and insert an new one in the stack at a
>>> > specific index.
>>> > 2. pbcast.NAKACK2's use_mcast_xmit attribute is auto-disabled when the
>>> > transport does not support multicast, so this property never needs to
>>> > be set explicitly.
>>> > 3. All clustering operations support the
>>> > {allow-resource-service-restart=true} operation header to avoid a full
>>> > server reload by restarting only affected services.  Batching
>>> > (especially when containing remove/add operation pairs) is encouraged
>>> > to eliminate redundant service restarts.
>>> >
>>> > [1] https://issues.jboss.org/browse/WFLY-9723
>>> > [2] https://issues.jboss.org/browse/JGRP-2230
>>> > On Wed, Sep 12, 2018 at 4:00 AM Sebastian Laskawiec <
>>> slaskawi at redhat.com> wrote:
>>> >>
>>> >> Hey guys,
>>> >>
>>> >> During our weekly sync meeting, Stian asked me to look into different
>>> options for clustering in Keycloak server. This topic has quite hot with
>>> the context of our Docker image (see the proposed community contributions
>>> [1][2][3]). Since we are based on WF 13, which uses JGroups 4.0.11 and has
>>> KUBE_PING in its modules, we have a couple of options how to do it.
>>> >>
>>> >> Before discussing different implementations, let me quickly go
>>> through the requirements:
>>> >> - We need a configuration stack that works for on-prem and cloud
>>> deployments with OpenShift as our primary target.
>>> >> - The configuration should be automatic (if it's possible). E.g. if
>>> we discover that Keycloak is running in the container, we should use proper
>>> discovery protocol.
>>> >> - There needs to be a way to override the discovery protocol manually.
>>> >>
>>> >> With those requirements in mind, we have a couple of implementation
>>> options on the table:
>>> >> 1. Add more stacks to the configuration, e.g. openshift, azure or
>>> gcp. Then we use the standard `-Djboss.default.jgroups.stack=<stack>`
>>> configuration switch.
>>> >> 2. Provide more standalone-*.xml configuration files, e.g.
>>> standalone-ha.xml (for on-prem) or standalone-cloud.xml.
>>> >> 3. Add protocols dynamically using CLI. A similar approach to what we
>>> did for the Data Grid Cache Service [4].
>>> >> 4. Use MULTI_PING protocols [5][6], with multiple discovery protocols
>>> on the same stack. This will include MPING (for multicasting), KUBE_PING
>>> (if we can access Kubernetes API), DNS_PING (if Pods are governed by a
>>> Service).
>>> >>
>>> >> Option #1 and #2 is somewhat similar to what we did for Infinispan
>>> [7]. It works quite well but the configuration grows quite quickly and most
>>> of the protocols (apart from discovery) are duplicated. On the other hand,
>>> having separate configuration pieces for each use case is very flexible.
>>> Having in mind that AWS cuts TCP connections, using FD_SOCK might lead to
>>> false suspicions but on GCP for the instance, FD_SOCK works quite nicely.
>>> The CLI option (#3), is also very flexible and probably should be
>>> implemented only in our Docker image. This somehow follows the convention
>>> we already started with different CLI files for different DBs [8]. Option
>>> #4 is brand new (implemented in JGroups 4.0.8; we have 4.0.11 as you
>>> probably recall). It has been specifically designed for this kind of use
>>> cases where we want to gather discovery data from multiple places. Using
>>> this way, we should end up with two stacks in standalone-ha.xml file - UDP
>>> and TCP.
>>> >>
>>> >> I honestly need to say, that my heart goes for options #4. However,
>>> as far as I know it hasn't been battle tested and we might get some
>>> surprises. All other options are not as elegant as option #4 but they are
>>> used somewhere in other projects. They are much safer options but they will
>>> add some maintenance burden on our shoulders.
>>> >>
>>> >> What would you suggest guys? What do you think about all this? @Rado,
>>> @Paul, @Tristan - Do you have any plans regarding this piece in Wildfly or
>>> Infinispan?
>>> >>
>>> >> Thanks,
>>> >> Sebastian
>>> >>
>>> >> [1] https://github.com/jboss-dockerfiles/keycloak/pull/96
>>> >> [2] https://github.com/jboss-dockerfiles/keycloak/pull/100
>>> >> [3] https://github.com/jboss-dockerfiles/keycloak/pull/116
>>> >> [4]
>>> https://github.com/jboss-container-images/datagrid-7-image/blob/datagrid-services-dev/modules/os-datagrid-online-services-configuration/src/main/bash/profiles/caching-service.cli#L37
>>> >> [5] http://www.jgroups.org/manual4/index.html#_multi_ping
>>> >> [6] https://issues.jboss.org/browse/JGRP-2224
>>> >> [7]
>>> https://github.com/infinispan/infinispan/tree/master/server/integration/jgroups/src/main/resources/subsystem-templates
>>> >> [8]
>>> https://github.com/jboss-dockerfiles/keycloak/tree/master/server/tools/cli/databases
>>>
>>> --
>>> Bela Ban | http://www.jgroups.org
>>>
>>>