[keycloak-user] Keycloak node cannot join cluster, initial state transfer timed out

Marek Posolda mposolda at redhat.com
Tue Sep 5 02:20:09 EDT 2017


On 05/09/17 01:48, Matt Evans wrote:
> Yes I've been digging into the infinispan docs :) You're right, from what I gather, the default timeout for the initial state transfer is 4 minutes, I would have thought that would have to be a lot of sessions to transfer for it to take longer than 4 mins. Now looking at how to view statistics on the caches to monitor this stuff.
There is something available through JMX. You can connect with jconsole 
and see some statistics. Maybe statistics needs to be enabled for 
infinispan caches (again see docs for details). There may be other ways 
to monitor this, but this one is likely the easiest for the start.
>
> I was wondering why the standalone-ha caches are using distributed caches and are configured with 1 owner, is this because it assumes session affinity for connections from the load balancer? Does it make more sense if the load balancers are not using session affinity for the caches to be replicated caches rather than distributed caches?
distributed with 1 owner is here to save memory. And yes, there is some 
session affinity support in latest master. You can try to add 2 or more 
owners or use replicated cache if you need failover (eg. after some node 
is killed or restarted, it's user sessions are lost and users need to 
re-authenticate if you have just 1 owner). However state transfer will 
probably take even more time if you increase number of owners or 
re-configure cache to be replicated. You can try and see.

Marek
>
> Matt
>
>
> -----Original Message-----
> From: Marek Posolda [mailto:mposolda at redhat.com]
> Sent: Tuesday, 5 September 2017 1:44 AM
> To: Matt Evans <mevans at aconex.com>; Meissa M'baye Sakho <msakho at redhat.com>
> Cc: keycloak-user at lists.jboss.org
> Subject: Re: [keycloak-user] Keycloak node cannot join cluster, initial state transfer timed out
>
> I think that you were right. Your cache is too big, it likely contains many user sessions. So the initial state transfer took quite a long time. Maybe during weekend, most people were logged-out, hence the state transfer was able to finish in time...
>
> It's possible to increase the timeout for the state transfer (I think it's 240 seconds by default, but not 100% sure). It will be good to check infinispan documentation and documentation about wildfly infinispan subsystem, which should provide more details.
>
> Marek
>
> On 04/09/17 04:40, Matt Evans wrote:
>> Strangely, it seems to have fixed itself over the weekend. I came to look at it this morning and the new node successfully retrieved the initial state data. I've not made any changes to configuration etc.
>>
>> I'd still like to know why it was happening and how to prevent it though.
>>
>> Matt
>>
>>
>> -----Original Message-----
>> From: keycloak-user-bounces at lists.jboss.org
>> [mailto:keycloak-user-bounces at lists.jboss.org] On Behalf Of Matt Evans
>> Sent: Saturday, 2 September 2017 7:47 AM
>> To: Meissa M'baye Sakho <msakho at redhat.com>
>> Cc: keycloak-user at lists.jboss.org
>> Subject: Re: [keycloak-user] Keycloak node cannot join cluster,
>> initial state transfer timed out
>>
>> No, I just start up keycloak and run standalone ha. There's no mention
>> of that property in the keycloak docs about clustering
>>
>> Matt
>>
>> ________________________________
>> From: Meissa M'baye Sakho <msakho at redhat.com>
>> Sent: Saturday, September 2, 2017 12:53:35 AM
>> To: Matt Evans
>> Cc: keycloak-user at lists.jboss.org
>> Subject: Re: [keycloak-user] Keycloak node cannot join cluster,
>> initial state transfer timed out
>>
>> Matt,
>> How did you add your new node?
>> Have you defined the jboss.node.name<http://jboss.node.name> property in your new node?
>> Meissa
>>
>> On Fri, Sep 1, 2017 at 6:31 AM, Matt Evans <mevans at aconex.com<mailto:mevans at aconex.com>> wrote:
>> We're running keycloak clustered with standalone-ha.xml, and it's been working fine.
>>
>> We changed the 'owners' of the distributed caches for session, loginFailures etc to 2 so that it will distribute those caches across the 2 nodes in the cluster.
>>
>> Now, when I remove a node and add a new node, the new node fails to start some of the services, due to:
>>
>> org.infinispan.commons.CacheException: Initial state transfer timed
>> out for cache sessions on xxxx
>>
>> Is this because it's actually taking too long to fetch the initial cache data from the other node? Is it due to the size of the cache, or some other issue?
>>
>> What can I do to address this so that I can add the node back into the cluster?
>>
>> I'm not experienced at all in infinispan or jgroups, so any pointers on how to query the servers to see whats in the caches, and how to see what's actually happening will be appreciated!
>>
>> Thanks
>>
>> Matt
>> _______________________________________________
>> keycloak-user mailing list
>> keycloak-user at lists.jboss.org<mailto:keycloak-user at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>
>> _______________________________________________
>> keycloak-user mailing list
>> keycloak-user at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>
>> _______________________________________________
>> keycloak-user mailing list
>> keycloak-user at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>



More information about the keycloak-user mailing list