[keycloak-user] Keycloak Domain TCP Clustering of Sessions in AWS not auto-failing over to node

Sebastian Laskawiec slaskawi at redhat.com
Wed Aug 28 10:16:12 EDT 2019


Let me add @Radoslav Husar <rhusar at redhat.com> and @Paul Ferraro
<pferraro at redhat.com> to the loop. Perhaps they might give you some more
accurate hints than I did.

On Wed, Aug 28, 2019 at 4:13 PM JTK <jonesy at sydow.org> wrote:

> OK,
>
> I originally had just the one channel and the same result. I added he
> other channel when I was reading articles on JBOSS through RedHat.
> Channel definition for two or more cache-container at JBoss EAP clustered
> environment
> https://access.redhat.com/solutions/3880301
>
> I've also used <protocol type="TCPPING"> with the same results
>
> I've looked at that URL previously,
> http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
> It references a different way than most articles at the beginning<socket-discovery-protocol
> And it shows it using names and not IPs - I'm not sure if that's the new
> way or you can also use IPs and it's using the standalone configuration, so
> no need to look externally for other nodes on the network.
> While most of the articles out there are using the legacy method which is
> later addresses and what I'm using.
> Even if I get it working as a stand-alone, it's not much help when there
> are multiple .xml files which must be configured across the different
> instances... i.e. domain.xml, host.xml, host-master.xml and host-slave.xml
>
> I'm going to do more troubleshooting but with no errors in my logs
> pertaining to the cluster outside of:
> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
> dev-master: no members discovered after 3004 ms: creating cluster as first
> member
> That doesn't give me a good place to troubleshoot.
> Does anyone know if this is the correct response when running this command?
> /profile=full-ha/subsystem=jgroups/channel=ee:write-
> attribute(name=stack,value=tcpping)
> {
>     "outcome" => "success",
>     "result" => undefined,
>     "server-groups" => {"auth-server-group" => {"host" => {
>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
> "success"}}},
>         "dev-slave1" => {"dev-slave1" => {"response" => {
>             "outcome" => "success",
>             "result" => undefined
>         }}}
>     }}}
> }
>
> No response from dev-slave1... is that normal?
>
> Thanks for your help, this has been frustrating. I think we are going to
> buy JBOSS support and move to RedHat SSO but for now it's simple Proof of
> Concept and I'm just missing the full clustering aspect of it.
>
> On Wed, Aug 28, 2019 at 8:50 AM Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>> I would strongly advice on getting clustering to work on standalone
>> configuration (./standalone.sh -c standalone-ha.xml). This way you will get
>> rid of all nose from the domain configuration. Once that part works fine,
>> then you may try to get this online with domain configuration.
>>
>> I briefly looked at this guide:
>> http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
>> and it should be enough to give you some hints about the configuration.
>>
>> More comments and hints inlined.
>>
>> On Tue, Aug 27, 2019 at 11:58 PM JTK <jonesy at sydow.org> wrote:
>>
>>> Thanks,
>>>
>>>  I've read over numerous articles to include the ones you listed and
>>> I've still been unable to location why they are not clustering.
>>>
>>> When I run this command I get the following output:
>>>
>>> /profile=full-ha/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcpping)
>>> {
>>>     "outcome" => "success",
>>>     "result" => undefined,
>>>     "server-groups" => {"auth-server-group" => {"host" => {
>>>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
>>> "success"}}},
>>>         "dev--slave1" => {"dev-slave1" => {"response" => {
>>>             "outcome" => "success",
>>>             "result" => undefined
>>>         }}}
>>>     }}}
>>> }
>>>
>>> Is that the normal output for the slave?
>>>
>>> Here is some other information from the logs:
>>> From the Master Node
>>> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
>>> dev-master: no members discovered after 3004 ms: creating cluster as first
>>> member
>>> This was with the Slave1 node up and running:
>>> [Host Controller] 21:35:53,349 INFO  [org.jboss.as.host.controller]
>>> (Host Controller Service Threads - 2) WFLYHC0148: Connected to master host
>>> controller at remote://10.10.10.77:9999
>>>
>>> On Master:
>>> [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread
>>> 1-2) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2'
>>> 9.4.8.Final
>>> [Server:dev-master] 21:36:02,081 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-4) ISPN000078: Starting JGroups channel ee
>>> [Server:dev-master] 21:36:02,086 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-2) ISPN000078: Starting JGroups channel ee
>>> [Server:dev-master] 21:36:02,087 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-1) ISPN000078: Starting JGroups channel ee
>>> [Server:dev-master] 21:36:02,089 INFO  [org.infinispan.CLUSTER] (MSC
>>> service thread 1-4) ISPN000094: Received new cluster view for channel ee:
>>> [dev-master|0] (1) [dev-master]
>>> [Server:dev-master] 21:36:02,090 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-3) ISPN000078: Starting JGroups channel ee
>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>> service thread 1-1) ISPN000094: Received new cluster view for channel ee:
>>> [dev-master|0] (1) [dev-master]
>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>>> [dev-master|0] (1) [dev-master]
>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>> service thread 1-3) ISPN000094: Received new cluster view for channel ee:
>>> [dev-master|0] (1) [dev-master]
>>> [Server:dev-master] 21:36:02,104 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>>> addresses are [10.10.10.77:7600]
>>> [Server:dev-master] 21:36:02,129 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-1) ISPN000079: Channel ee local address is dev-master, physical
>>> addresses are [10.10.10.77:7600]
>>> [Server:dev-master] 21:36:02,149 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-4) ISPN000079: Channel ee local address is dev-master, physical
>>> addresses are [10.10.10.77:7600]
>>> [Server:dev-master] 21:36:02,151 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-3) ISPN000079: Channel ee local address is dev-master, physical
>>> addresses are [10.10.10.77:7600]
>>> [Server:dev-master] 21:36:02,296 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-2) ISPN000078: Starting JGroups channel ee
>>> [Server:dev-master] 21:36:02,297 INFO  [org.infinispan.CLUSTER] (MSC
>>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>>> [dev-master|0] (1) [dev-master]
>>> [Server:dev-master] 21:36:02,325 INFO
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>>> addresses are [10.10.10.77:7600]
>>>
>>> This is the configuration from the domain.xml on the master
>>>                 <channels default="ee">
>>>                     <channel name="ee" stack="tcp"/>
>>>
>>
>> I believe this doesn't look right. It seems you have two channels, which
>> seems problematic. There should be only one.
>>
>>
>>>                     <channel name="ee" stack="tcpping"/>
>>>                 </channels>
>>>                 <stacks>
>>>                     <stack name="tcp">
>>>                         <transport type="TCP"
>>> socket-binding="jgroups-tcp"/>
>>>                         <protocol type="MERGE3"/>
>>>                         <protocol type="FD_SOCK"/>
>>>                         <protocol type="FD_ALL"/>
>>>                         <protocol type="VERIFY_SUSPECT"/>
>>>                         <protocol type="pbcast.NAKACK2"/>
>>>                         <protocol type="UNICAST3"/>
>>>                         <protocol type="pbcast.STABLE"/>
>>>                         <protocol type="pbcast.GMS"/>
>>>                         <protocol type="MFC"/>
>>>                         <protocol type="FRAG3"/>
>>>                     </stack>
>>>                     <stack name="tcpping">
>>>                         <transport type="TCP"
>>> socket-binding="jgroups-tcp"/>
>>>                         <protocol type="org.jgroups.protocols.TCPPING">
>>>
>>
>> This can be simplified to <protocol type="TCPPING">
>>
>>
>>>                             <property name="initial_hosts">
>>>                                 ${jboss.cluster.tcp.initial_hosts}
>>>
>>
>> Once trying it out on standalone confiuration, you mey manipulate this
>> property by booting the cluster up with `./standalone.sh -c
>> standalone-ha.xml
>> -Djboss.cluster.tcp.initial_hosts="10.10.10.77[7600],10.10.11.27[7600]"`. I
>> think that way will be way faster than manipulating the xml configuraiton.
>>
>>
>>>                             </property>
>>>                             <property name="port_range">
>>>                                 0
>>>                             </property>
>>>                         </protocol>
>>>                         <protocol type="MERGE3"/>
>>>                         <protocol type="FD_SOCK"/>
>>>                         <protocol type="FD_ALL"/>
>>>                         <protocol type="VERIFY_SUSPECT"/>
>>>                         <protocol type="pbcast.NAKACK2"/>
>>>                         <protocol type="UNICAST3"/>
>>>                         <protocol type="pbcast.STABLE"/>
>>>                         <protocol type="pbcast.GMS"/>
>>>                         <protocol type="MFC"/>
>>>                         <protocol type="FRAG3"/>
>>>                     </stack>
>>>
>>>     <server-groups>
>>>         <server-group name="auth-server-group" profile="full-ha">
>>>             <jvm name="default">
>>>                 <heap size="64m" max-size="512m"/>
>>>             </jvm>
>>>             <socket-binding-group ref="ha-sockets"/>
>>>             <system-properties>
>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>> value="10.10.10.77[7600],10.10.11.27[7600]"/>
>>>             </system-properties>
>>>         </server-group>
>>>     </server-groups>
>>>
>>> This is a question which I find conflicting information when reading
>>> RedHat/JBOSS/WildFly
>>>  From the Master, this is the host.xml file
>>>  I've had both the master and the slave listed here. Which do I need or
>>> do I need both?  And the same goes for configuring on the Slave's host.xml
>>> file
>>>
>>>     <servers>
>>>         <server name="dev-slave1" group="auth-server-group"
>>> auto-start="true">
>>>             <socket-bindings port-offset="0"/>
>>>         </server>
>>>     </servers>
>>>
>>> The same question would apply to the host-master.xml on the Master and
>>> the host-slave.xml on the Slave in reference to:
>>> This is from the host-master.xml on the Master
>>>     <servers>
>>>         <server name="dev-sentinel-master" group="auth-server-group"
>>> auto-start="true">
>>>             <socket-bindings port-offset="0"/>
>>>         </server>
>>>     </servers>
>>>
>>> Included screenshot showing WildFly Management Console with both servers
>>> up and green in the auth-server-group of the full-ha profile
>>> https://i.imgur.com/8g124Ss.png
>>>
>>> I know I'm just missing something small, and I'm not getting any errors
>>> in the logs. Is there anyway to get more TRACE or DEBUG in regards to
>>> clustering?
>>>
>>> Thanks!
>>>
>>>
>>> On Tue, Aug 27, 2019 at 5:09 AM Sebastian Laskawiec <slaskawi at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Aug 26, 2019 at 5:32 PM JTK <jonesy at sydow.org> wrote:
>>>>
>>>>> I have two nodes setup in a cluster using TCP port 7600 and I see them
>>>>> join
>>>>> the cluster in the logs.
>>>>> On Master: [Host Controller] 15:07:18,293 INFO
>>>>>  [org.jboss.as.domain.controller] (Host Controller Service Threads - 7)
>>>>> WFLYHC0019: Registered remote slave host "dev-slave1", JBoss Keycloak
>>>>> 6.0.1
>>>>> (WildFly 8.0.0.Final)
>>>>> On Slave: [Host Controller] 15:03:12,603 INFO
>>>>>  [org.jboss.as.host.controller] (Host Controller Service Threads - 3)
>>>>> WFLYHC0148: Connected to master host controller at remote://
>>>>> 10.10.10.77:9999
>>>>>
>>>>> In the WildFly admin panel I see the server group: auth-server-group
>>>>> which
>>>>> is ha and then I see both servers in the group and they are both green.
>>>>>
>>>>> I've set the distributed-cache setup to 2 in domain.xml, so it should
>>>>> be
>>>>> sharing session information:
>>>>>                     <distributed-cache name="sessions" owners="2"/>
>>>>>                     <distributed-cache name="authenticationSessions"
>>>>> owners="2"/>
>>>>>                     <distributed-cache name="offlineSessions"
>>>>> owners="2"/>
>>>>>                     <distributed-cache name="clientSessions"
>>>>> owners="2"/>
>>>>>                     <distributed-cache name="offlineClientSessions"
>>>>> owners="2"/>
>>>>>                     <distributed-cache name="loginFailures"
>>>>> owners="2"/>
>>>>>                     <distributed-cache name="actionTokens" owners="2">
>>>>>
>>>>> Here is the logs on the master showing there a new cluster has been
>>>>> received:
>>>>> 2019-08-26 15:03:19,776 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-master|0]
>>>>> (1) [dev-master]
>>>>> 2019-08-26 15:03:19,779 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-master|0]
>>>>> (1) [dev-master]
>>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-master|0]
>>>>> (1) [dev-master]
>>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-master|0]
>>>>> (1) [dev-master]
>>>>> 2019-08-26 15:03:19,875 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-master|0]
>>>>> (1) [dev-master]
>>>>>
>>>>> And on the slave:
>>>>> 2019-08-26 15:07:29,567 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-slave1|0]
>>>>> (1) [dev-slave1]
>>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-slave1|0]
>>>>> (1) [dev-slave1]
>>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-slave1|0]
>>>>> (1) [dev-slave1]
>>>>> 2019-08-26 15:07:29,574 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-slave1|0]
>>>>> (1) [dev-slave1]
>>>>> 2019-08-26 15:07:29,635 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>> thread
>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>> [dev-slave1|0]
>>>>> (1) [dev-slave1]
>>>>>
>>>>
>>>> This definitely doesn't look right. The view id (which is increasing
>>>> monotonically) is 0. Which means this is an initial view and none of the
>>>> new members joined. Clearly, the discovery protocol is not configured
>>>> properly and both nodes are in separate (singleton) clusters.
>>>>
>>>>
>>>>> I believe I read somewhere that I was supposed to see the master and
>>>>> slave
>>>>> together in the logs an not just master or slave. Maybe this is my
>>>>> issue,
>>>>> but I don't know how to resolve it.
>>>>>
>>>>> I can't use multi-cast as it's disabled in AWS and almost all cloud
>>>>> providers.
>>>>>
>>>>
>>>> The easiest option is to use TCPPING. However, it requires you to put
>>>> all nodes IPs in its configuration [1]. There are other options as well,
>>>> e.g. S3 Ping [2] and its rewritten (and much better) version - Native S3
>>>> Ping [3].
>>>>
>>>> You may also be interested in using JDBC_PING. Please have a look at
>>>> the our blogs [4][5].
>>>>
>>>> [1] http://jgroups.org/manual4/index.html#TCPPING_Prot
>>>> [2] http://jgroups.org/manual4/index.html#_s3_ping
>>>> [3] https://github.com/jgroups-extras/native-s3-ping
>>>> [4] https://www.keycloak.org/2019/05/keycloak-cluster-setup.html
>>>> [5] https://www.keycloak.org/2019/08/keycloak-jdbc-ping.html
>>>>
>>>>
>>>>>
>>>>> When I launch the master and let it come up, then launch the slave I
>>>>> can
>>>>> see all the traffic for the session on the master. As soon as I stop
>>>>> the
>>>>> master, the slave is looking for the master, but when I click on the
>>>>> website, it just hangs waiting for a connection and then eventually
>>>>> logs me
>>>>> out, and I end up logging back in, and now I'm on the slave node. The
>>>>> shared sessions are not happening. Is there something else I need to
>>>>> do or
>>>>> set?
>>>>>
>>>>
>>>> It looks like a consequence of the JGroups discovery issue. Please try
>>>> to fix the clustering problem and then see if this one appears again.
>>>>
>>>>
>>>>>
>>>>> I have this setup in my domain.xml configuration as well:
>>>>>         <server-group name="auth-server-group" profile="ha">
>>>>>             <jvm name="default">
>>>>>                 <heap size="64m" max-size="512m"/>
>>>>>             </jvm>
>>>>>             <socket-binding-group ref="ha-sockets"/>
>>>>>             <system-properties>
>>>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>>>> value="10.10.10.77[7600],10.10.10.27[7600]"/>
>>>>>             </system-properties>
>>>>>         </server-group>
>>>>>
>>>>> In my host.xml on the slave I have this setup to reach back to the
>>>>> master
>>>>> as the domain controller
>>>>>     <domain-controller>
>>>>>         <remote protocol="remote" host="${jboss.domain.master.address}"
>>>>> port="${jboss.domain.master.port:9999}"
>>>>> security-realm="ManagementRealm"/>
>>>>>    </domain-controller>
>>>>>
>>>>> Any help would be appreciated
>>>>> _______________________________________________
>>>>> keycloak-user mailing list
>>>>> keycloak-user at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>>>
>>>>


More information about the keycloak-user mailing list