[keycloak-user] Keycloak Domain TCP Clustering of Sessions in AWS not auto-failing over to node

Wed Aug 28 10:12:19 EDT 2019

OK,

I originally had just the one channel and the same result. I added he other
channel when I was reading articles on JBOSS through RedHat.
Channel definition for two or more cache-container at JBoss EAP clustered
environment
https://access.redhat.com/solutions/3880301

I've also used <protocol type="TCPPING"> with the same results

I've looked at that URL previously,
http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
It references a different way than most articles at the
beginning<socket-discovery-protocol
And it shows it using names and not IPs - I'm not sure if that's the new
way or you can also use IPs and it's using the standalone configuration, so
no need to look externally for other nodes on the network.
While most of the articles out there are using the legacy method which is
later addresses and what I'm using.
Even if I get it working as a stand-alone, it's not much help when there
are multiple .xml files which must be configured across the different
instances... i.e. domain.xml, host.xml, host-master.xml and host-slave.xml

I'm going to do more troubleshooting but with no errors in my logs
pertaining to the cluster outside of:
[org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
dev-master: no members discovered after 3004 ms: creating cluster as first
member
That doesn't give me a good place to troubleshoot.
Does anyone know if this is the correct response when running this command?
/profile=full-ha/subsystem=jgroups/channel=ee:write-
attribute(name=stack,value=tcpping)
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => {"auth-server-group" => {"host" => {
        "dev-master" => {"dev-master" => {"response" => {"outcome" =>
"success"}}},
        "dev-slave1" => {"dev-slave1" => {"response" => {
            "outcome" => "success",
            "result" => undefined
        }}}
    }}}
}

No response from dev-slave1... is that normal?

Thanks for your help, this has been frustrating. I think we are going to
buy JBOSS support and move to RedHat SSO but for now it's simple Proof of
Concept and I'm just missing the full clustering aspect of it.

On Wed, Aug 28, 2019 at 8:50 AM Sebastian Laskawiec <slaskawi at redhat.com>
wrote:

> I would strongly advice on getting clustering to work on standalone
> configuration (./standalone.sh -c standalone-ha.xml). This way you will get
> rid of all nose from the domain configuration. Once that part works fine,
> then you may try to get this online with domain configuration.
>
> I briefly looked at this guide:
> http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
> and it should be enough to give you some hints about the configuration.
>
> More comments and hints inlined.
>
> On Tue, Aug 27, 2019 at 11:58 PM JTK <jonesy at sydow.org> wrote:
>
>> Thanks,
>>
>>  I've read over numerous articles to include the ones you listed and I've
>> still been unable to location why they are not clustering.
>>
>> When I run this command I get the following output:
>>
>> /profile=full-ha/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcpping)
>> {
>>     "outcome" => "success",
>>     "result" => undefined,
>>     "server-groups" => {"auth-server-group" => {"host" => {
>>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
>> "success"}}},
>>         "dev--slave1" => {"dev-slave1" => {"response" => {
>>             "outcome" => "success",
>>             "result" => undefined
>>         }}}
>>     }}}
>> }
>>
>> Is that the normal output for the slave?
>>
>> Here is some other information from the logs:
>> From the Master Node
>> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
>> dev-master: no members discovered after 3004 ms: creating cluster as first
>> member
>> This was with the Slave1 node up and running:
>> [Host Controller] 21:35:53,349 INFO  [org.jboss.as.host.controller] (Host
>> Controller Service Threads - 2) WFLYHC0148: Connected to master host
>> controller at remote://10.10.10.77:9999
>>
>> On Master:
>> [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread
>> 1-2) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2'
>> 9.4.8.Final
>> [Server:dev-master] 21:36:02,081 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-4) ISPN000078: Starting JGroups channel ee
>> [Server:dev-master] 21:36:02,086 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-2) ISPN000078: Starting JGroups channel ee
>> [Server:dev-master] 21:36:02,087 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-1) ISPN000078: Starting JGroups channel ee
>> [Server:dev-master] 21:36:02,089 INFO  [org.infinispan.CLUSTER] (MSC
>> service thread 1-4) ISPN000094: Received new cluster view for channel ee:
>> [dev-master|0] (1) [dev-master]
>> [Server:dev-master] 21:36:02,090 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-3) ISPN000078: Starting JGroups channel ee
>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>> service thread 1-1) ISPN000094: Received new cluster view for channel ee:
>> [dev-master|0] (1) [dev-master]
>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>> [dev-master|0] (1) [dev-master]
>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>> service thread 1-3) ISPN000094: Received new cluster view for channel ee:
>> [dev-master|0] (1) [dev-master]
>> [Server:dev-master] 21:36:02,104 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>> addresses are [10.10.10.77:7600]
>> [Server:dev-master] 21:36:02,129 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-1) ISPN000079: Channel ee local address is dev-master, physical
>> addresses are [10.10.10.77:7600]
>> [Server:dev-master] 21:36:02,149 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-4) ISPN000079: Channel ee local address is dev-master, physical
>> addresses are [10.10.10.77:7600]
>> [Server:dev-master] 21:36:02,151 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-3) ISPN000079: Channel ee local address is dev-master, physical
>> addresses are [10.10.10.77:7600]
>> [Server:dev-master] 21:36:02,296 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-2) ISPN000078: Starting JGroups channel ee
>> [Server:dev-master] 21:36:02,297 INFO  [org.infinispan.CLUSTER] (MSC
>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>> [dev-master|0] (1) [dev-master]
>> [Server:dev-master] 21:36:02,325 INFO
>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>> addresses are [10.10.10.77:7600]
>>
>> This is the configuration from the domain.xml on the master
>>                 <channels default="ee">
>>                     <channel name="ee" stack="tcp"/>
>>
>
> I believe this doesn't look right. It seems you have two channels, which
> seems problematic. There should be only one.
>
>
>>                     <channel name="ee" stack="tcpping"/>
>>                 </channels>
>>                 <stacks>
>>                     <stack name="tcp">
>>                         <transport type="TCP"
>> socket-binding="jgroups-tcp"/>
>>                         <protocol type="MERGE3"/>
>>                         <protocol type="FD_SOCK"/>
>>                         <protocol type="FD_ALL"/>
>>                         <protocol type="VERIFY_SUSPECT"/>
>>                         <protocol type="pbcast.NAKACK2"/>
>>                         <protocol type="UNICAST3"/>
>>                         <protocol type="pbcast.STABLE"/>
>>                         <protocol type="pbcast.GMS"/>
>>                         <protocol type="MFC"/>
>>                         <protocol type="FRAG3"/>
>>                     </stack>
>>                     <stack name="tcpping">
>>                         <transport type="TCP"
>> socket-binding="jgroups-tcp"/>
>>                         <protocol type="org.jgroups.protocols.TCPPING">
>>
>
> This can be simplified to <protocol type="TCPPING">
>
>
>>                             <property name="initial_hosts">
>>                                 ${jboss.cluster.tcp.initial_hosts}
>>
>
> Once trying it out on standalone confiuration, you mey manipulate this
> property by booting the cluster up with `./standalone.sh -c
> standalone-ha.xml
> -Djboss.cluster.tcp.initial_hosts="10.10.10.77[7600],10.10.11.27[7600]"`. I
> think that way will be way faster than manipulating the xml configuraiton.
>
>
>>                             </property>
>>                             <property name="port_range">
>>                                 0
>>                             </property>
>>                         </protocol>
>>                         <protocol type="MERGE3"/>
>>                         <protocol type="FD_SOCK"/>
>>                         <protocol type="FD_ALL"/>
>>                         <protocol type="VERIFY_SUSPECT"/>
>>                         <protocol type="pbcast.NAKACK2"/>
>>                         <protocol type="UNICAST3"/>
>>                         <protocol type="pbcast.STABLE"/>
>>                         <protocol type="pbcast.GMS"/>
>>                         <protocol type="MFC"/>
>>                         <protocol type="FRAG3"/>
>>                     </stack>
>>
>>     <server-groups>
>>         <server-group name="auth-server-group" profile="full-ha">
>>             <jvm name="default">
>>                 <heap size="64m" max-size="512m"/>
>>             </jvm>
>>             <socket-binding-group ref="ha-sockets"/>
>>             <system-properties>
>>                 <property name="jboss.cluster.tcp.initial_hosts"
>> value="10.10.10.77[7600],10.10.11.27[7600]"/>
>>             </system-properties>
>>         </server-group>
>>     </server-groups>
>>
>> This is a question which I find conflicting information when reading
>> RedHat/JBOSS/WildFly
>>  From the Master, this is the host.xml file
>>  I've had both the master and the slave listed here. Which do I need or
>> do I need both?  And the same goes for configuring on the Slave's host.xml
>> file
>>
>>     <servers>
>>         <server name="dev-slave1" group="auth-server-group"
>> auto-start="true">
>>             <socket-bindings port-offset="0"/>
>>         </server>
>>     </servers>
>>
>> The same question would apply to the host-master.xml on the Master and
>> the host-slave.xml on the Slave in reference to:
>> This is from the host-master.xml on the Master
>>     <servers>
>>         <server name="dev-sentinel-master" group="auth-server-group"
>> auto-start="true">
>>             <socket-bindings port-offset="0"/>
>>         </server>
>>     </servers>
>>
>> Included screenshot showing WildFly Management Console with both servers
>> up and green in the auth-server-group of the full-ha profile
>> https://i.imgur.com/8g124Ss.png
>>
>> I know I'm just missing something small, and I'm not getting any errors
>> in the logs. Is there anyway to get more TRACE or DEBUG in regards to
>> clustering?
>>
>> Thanks!
>>
>>
>> On Tue, Aug 27, 2019 at 5:09 AM Sebastian Laskawiec <slaskawi at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Aug 26, 2019 at 5:32 PM JTK <jonesy at sydow.org> wrote:
>>>
>>>> I have two nodes setup in a cluster using TCP port 7600 and I see them
>>>> join
>>>> the cluster in the logs.
>>>> On Master: [Host Controller] 15:07:18,293 INFO
>>>>  [org.jboss.as.domain.controller] (Host Controller Service Threads - 7)
>>>> WFLYHC0019: Registered remote slave host "dev-slave1", JBoss Keycloak
>>>> 6.0.1
>>>> (WildFly 8.0.0.Final)
>>>> On Slave: [Host Controller] 15:03:12,603 INFO
>>>>  [org.jboss.as.host.controller] (Host Controller Service Threads - 3)
>>>> WFLYHC0148: Connected to master host controller at remote://
>>>> 10.10.10.77:9999
>>>>
>>>> In the WildFly admin panel I see the server group: auth-server-group
>>>> which
>>>> is ha and then I see both servers in the group and they are both green.
>>>>
>>>> I've set the distributed-cache setup to 2 in domain.xml, so it should be
>>>> sharing session information:
>>>>                     <distributed-cache name="sessions" owners="2"/>
>>>>                     <distributed-cache name="authenticationSessions"
>>>> owners="2"/>
>>>>                     <distributed-cache name="offlineSessions"
>>>> owners="2"/>
>>>>                     <distributed-cache name="clientSessions"
>>>> owners="2"/>
>>>>                     <distributed-cache name="offlineClientSessions"
>>>> owners="2"/>
>>>>                     <distributed-cache name="loginFailures" owners="2"/>
>>>>                     <distributed-cache name="actionTokens" owners="2">
>>>>
>>>> Here is the logs on the master showing there a new cluster has been
>>>> received:
>>>> 2019-08-26 15:03:19,776 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-master|0]
>>>> (1) [dev-master]
>>>> 2019-08-26 15:03:19,779 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-master|0]
>>>> (1) [dev-master]
>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-master|0]
>>>> (1) [dev-master]
>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-master|0]
>>>> (1) [dev-master]
>>>> 2019-08-26 15:03:19,875 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-master|0]
>>>> (1) [dev-master]
>>>>
>>>> And on the slave:
>>>> 2019-08-26 15:07:29,567 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-slave1|0]
>>>> (1) [dev-slave1]
>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-slave1|0]
>>>> (1) [dev-slave1]
>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-slave1|0]
>>>> (1) [dev-slave1]
>>>> 2019-08-26 15:07:29,574 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-slave1|0]
>>>> (1) [dev-slave1]
>>>> 2019-08-26 15:07:29,635 INFO  [org.infinispan.CLUSTER] (MSC service
>>>> thread
>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>> [dev-slave1|0]
>>>> (1) [dev-slave1]
>>>>
>>>
>>> This definitely doesn't look right. The view id (which is increasing
>>> monotonically) is 0. Which means this is an initial view and none of the
>>> new members joined. Clearly, the discovery protocol is not configured
>>> properly and both nodes are in separate (singleton) clusters.
>>>
>>>
>>>> I believe I read somewhere that I was supposed to see the master and
>>>> slave
>>>> together in the logs an not just master or slave. Maybe this is my
>>>> issue,
>>>> but I don't know how to resolve it.
>>>>
>>>> I can't use multi-cast as it's disabled in AWS and almost all cloud
>>>> providers.
>>>>
>>>
>>> The easiest option is to use TCPPING. However, it requires you to put
>>> all nodes IPs in its configuration [1]. There are other options as well,
>>> e.g. S3 Ping [2] and its rewritten (and much better) version - Native S3
>>> Ping [3].
>>>
>>> You may also be interested in using JDBC_PING. Please have a look at the
>>> our blogs [4][5].
>>>
>>> [1] http://jgroups.org/manual4/index.html#TCPPING_Prot
>>> [2] http://jgroups.org/manual4/index.html#_s3_ping
>>> [3] https://github.com/jgroups-extras/native-s3-ping
>>> [4] https://www.keycloak.org/2019/05/keycloak-cluster-setup.html
>>> [5] https://www.keycloak.org/2019/08/keycloak-jdbc-ping.html
>>>
>>>
>>>>
>>>> When I launch the master and let it come up, then launch the slave I can
>>>> see all the traffic for the session on the master. As soon as I stop the
>>>> master, the slave is looking for the master, but when I click on the
>>>> website, it just hangs waiting for a connection and then eventually
>>>> logs me
>>>> out, and I end up logging back in, and now I'm on the slave node. The
>>>> shared sessions are not happening. Is there something else I need to do
>>>> or
>>>> set?
>>>>
>>>
>>> It looks like a consequence of the JGroups discovery issue. Please try
>>> to fix the clustering problem and then see if this one appears again.
>>>
>>>
>>>>
>>>> I have this setup in my domain.xml configuration as well:
>>>>         <server-group name="auth-server-group" profile="ha">
>>>>             <jvm name="default">
>>>>                 <heap size="64m" max-size="512m"/>
>>>>             </jvm>
>>>>             <socket-binding-group ref="ha-sockets"/>
>>>>             <system-properties>
>>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>>> value="10.10.10.77[7600],10.10.10.27[7600]"/>
>>>>             </system-properties>
>>>>         </server-group>
>>>>
>>>> In my host.xml on the slave I have this setup to reach back to the
>>>> master
>>>> as the domain controller
>>>>     <domain-controller>
>>>>         <remote protocol="remote" host="${jboss.domain.master.address}"
>>>> port="${jboss.domain.master.port:9999}"
>>>> security-realm="ManagementRealm"/>
>>>>    </domain-controller>
>>>>
>>>> Any help would be appreciated
>>>> _______________________________________________
>>>> keycloak-user mailing list
>>>> keycloak-user at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>>
>>>