On 05/09/17 01:48, Matt Evans wrote:
Yes I've been digging into the infinispan docs :) You're
right, from what I gather, the default timeout for the initial state transfer is 4
minutes, I would have thought that would have to be a lot of sessions to transfer for it
to take longer than 4 mins. Now looking at how to view statistics on the caches to monitor
this stuff.
There is something available through JMX. You can connect with jconsole
and see some statistics. Maybe statistics needs to be enabled for
infinispan caches (again see docs for details). There may be other ways
to monitor this, but this one is likely the easiest for the start.
I was wondering why the standalone-ha caches are using distributed caches and are
configured with 1 owner, is this because it assumes session affinity for connections from
the load balancer? Does it make more sense if the load balancers are not using session
affinity for the caches to be replicated caches rather than distributed caches?
distributed with 1 owner is here to save memory. And yes, there is some
session affinity support in latest master. You can try to add 2 or more
owners or use replicated cache if you need failover (eg. after some node
is killed or restarted, it's user sessions are lost and users need to
re-authenticate if you have just 1 owner). However state transfer will
probably take even more time if you increase number of owners or
re-configure cache to be replicated. You can try and see.
Marek
Matt
-----Original Message-----
From: Marek Posolda [mailto:mposolda@redhat.com]
Sent: Tuesday, 5 September 2017 1:44 AM
To: Matt Evans <mevans(a)aconex.com>; Meissa M'baye Sakho
<msakho(a)redhat.com>
Cc: keycloak-user(a)lists.jboss.org
Subject: Re: [keycloak-user] Keycloak node cannot join cluster, initial state transfer
timed out
I think that you were right. Your cache is too big, it likely contains many user
sessions. So the initial state transfer took quite a long time. Maybe during weekend, most
people were logged-out, hence the state transfer was able to finish in time...
It's possible to increase the timeout for the state transfer (I think it's 240
seconds by default, but not 100% sure). It will be good to check infinispan documentation
and documentation about wildfly infinispan subsystem, which should provide more details.
Marek
On 04/09/17 04:40, Matt Evans wrote:
> Strangely, it seems to have fixed itself over the weekend. I came to look at it this
morning and the new node successfully retrieved the initial state data. I've not made
any changes to configuration etc.
>
> I'd still like to know why it was happening and how to prevent it though.
>
> Matt
>
>
> -----Original Message-----
> From: keycloak-user-bounces(a)lists.jboss.org
> [mailto:keycloak-user-bounces@lists.jboss.org] On Behalf Of Matt Evans
> Sent: Saturday, 2 September 2017 7:47 AM
> To: Meissa M'baye Sakho <msakho(a)redhat.com>
> Cc: keycloak-user(a)lists.jboss.org
> Subject: Re: [keycloak-user] Keycloak node cannot join cluster,
> initial state transfer timed out
>
> No, I just start up keycloak and run standalone ha. There's no mention
> of that property in the keycloak docs about clustering
>
> Matt
>
> ________________________________
> From: Meissa M'baye Sakho <msakho(a)redhat.com>
> Sent: Saturday, September 2, 2017 12:53:35 AM
> To: Matt Evans
> Cc: keycloak-user(a)lists.jboss.org
> Subject: Re: [keycloak-user] Keycloak node cannot join cluster,
> initial state transfer timed out
>
> Matt,
> How did you add your new node?
> Have you defined the jboss.node.name<http://jboss.node.name> property in your
new node?
> Meissa
>
> On Fri, Sep 1, 2017 at 6:31 AM, Matt Evans
<mevans@aconex.com<mailto:mevans@aconex.com>> wrote:
> We're running keycloak clustered with standalone-ha.xml, and it's been
working fine.
>
> We changed the 'owners' of the distributed caches for session, loginFailures
etc to 2 so that it will distribute those caches across the 2 nodes in the cluster.
>
> Now, when I remove a node and add a new node, the new node fails to start some of the
services, due to:
>
> org.infinispan.commons.CacheException: Initial state transfer timed
> out for cache sessions on xxxx
>
> Is this because it's actually taking too long to fetch the initial cache data
from the other node? Is it due to the size of the cache, or some other issue?
>
> What can I do to address this so that I can add the node back into the cluster?
>
> I'm not experienced at all in infinispan or jgroups, so any pointers on how to
query the servers to see whats in the caches, and how to see what's actually happening
will be appreciated!
>
> Thanks
>
> Matt
> _______________________________________________
> keycloak-user mailing list
> keycloak-user@lists.jboss.org<mailto:keycloak-user@lists.jboss.org>
>
https://lists.jboss.org/mailman/listinfo/keycloak-user
>
> _______________________________________________
> keycloak-user mailing list
> keycloak-user(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/keycloak-user
>
> _______________________________________________
> keycloak-user mailing list
> keycloak-user(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/keycloak-user