[keycloak-user] JGroups failure: failed submitting DONT_BUNDLE message to thread pool

Hynek Mlnarik hmlnarik at redhat.com
Thu Aug 10 08:32:50 EDT 2017


You seem to be facing https://issues.jboss.org/browse/WFLY-6179. Once
keycloak updates to WF 10.1/11.x, this issue should be resolved.

On Wed, Aug 9, 2017 at 1:32 PM, Edwin de Jong <edwin.de.jong at simacan.com> wrote:
> Dear Keycloak users (and devs),
>
> This morning, we faced a production level issue on our Keycloak Cluster,
> running in a 3-node formation on EC2. Symptoms were a high failure rate
> of requests (> 20%) and high latency (> 10 seconds). We are currently
> trying to figure out what went wrong. We would appreciate it if someone
> with knowledge op JGroups / Inifinispan could chime in with a working
> hypothesis.
>
> About priority: we are currently running nominally. We have brought down
> two of the instances and brought up two new instances. The cluster is
> working again as expected.
>
> Below I'll give information about our setup, the relevant log-messages
> and links to some screenshots of our monitoring:
>
> EC2 instances are C4.Large (3x)
> Keycloak Version 3.1.0-FINAL
>
> Normal CPU usage is around 1% or less. It peaked to 16% during the incident.
> Memory usage is normal.
>
> Screenshots:
>
> - datadog statistics of our services calling keycloak: https://ibb.co/dsDTKv
> - AWS EC2 instance Cloudwatch statistics: network out rate (in bytes per
> MINUTE): https://ibb.co/j8jhCF
> - AWS EC2 instance Cloudwatch statistics: network in rate (in bytes per
> MINUTE): https://ibb.co/ggLuRa
>
> Log lines, just before failure (to help reduce clutter, I've removed the
> date and replaced the IP addresses with "IP-A", "IP-B", "IP-C"). The
> last message is repeated around 500.000 times in the span of around 1
> minute.
>
> ---------------------->%-----------------------
> 05:09:23,925 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-17,ee,ip-B) ISPN000094: Received new cluster view for channel
> server: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,926 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-17,ee,ip-B) ISPN000094: Received new cluster view for channel
> keycloak: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,926 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-17,ee,ip-B) ISPN000094: Received new cluster view for channel
> web: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,926 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-17,ee,ip-B) ISPN000094: Received new cluster view for channel
> ejb: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,928 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-17,ee,ip-B) ISPN000094: Received new cluster view for channel
> hibernate: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,990 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-A) ISPN000094: Received new cluster view for channel
> server: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,990 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-A) ISPN000094: Received new cluster view for channel
> keycloak: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,991 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-A) ISPN000094: Received new cluster view for channel
> web: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,992 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-A) ISPN000094: Received new cluster view for channel
> hibernate: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,992 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-A) ISPN000094: Received new cluster view for channel
> ejb: [ip-A|3] (2) [ip-A, ip-B]
>  05:09:23,996 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t18)
> ISPN000310: Starting cluster-wide rebalance for cache authorization,
> topology CacheTopology{id=6, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A: 54+26, ip-B:
> 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A:
> 40+40, ip-B: 40+40]}, unionCH=null, actualMembers=[ip-A, ip-B]}
>  05:09:24,001 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t18)
> ISPN000310: Starting cluster-wide rebalance for cache sessions, topology
> CacheTopology{id=6, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A: 54+26, ip-B:
> 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A:
> 40+40, ip-B: 40+40]}, unionCH=null, actualMembers=[ip-A, ip-B]}
>  05:09:24,004 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t18)
> ISPN000310: Starting cluster-wide rebalance for cache offlineSessions,
> topology CacheTopology{id=6, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A: 54+26, ip-B:
> 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A:
> 40+40, ip-B: 40+40]}, unionCH=null, actualMembers=[ip-A, ip-B]}
>  05:09:24,014 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t18)
> ISPN000310: Starting cluster-wide rebalance for cache loginFailures,
> topology CacheTopology{id=6, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A: 54+26, ip-B:
> 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners = (2)[ip-A:
> 40+40, ip-B: 40+40]}, unionCH=null, actualMembers=[ip-A, ip-B]}
>  05:09:24,027 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t130)
> ISPN000336: Finished cluster-wide rebalance for cache sessions, topology
> id = 6
>  05:09:24,028 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t130)
> ISPN000336: Finished cluster-wide rebalance for cache offlineSessions,
> topology id = 6
>  05:09:24,029 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t131)
> ISPN000336: Finished cluster-wide rebalance for cache loginFailures,
> topology id = 6
>  05:09:24,029 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t132)
> ISPN000336: Finished cluster-wide rebalance for cache authorization,
> topology id = 6
>  05:09:33,567 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-C) ISPN000093: Received new, MERGED cluster view for
> channel server: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,569 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-C) ISPN000093: Received new, MERGED cluster view for
> channel keycloak: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,569 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-C) ISPN000093: Received new, MERGED cluster view for
> channel web: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,573 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-C) ISPN000093: Received new, MERGED cluster view for
> channel ejb: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,575 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-1,ee,ip-C) ISPN000093: Received new, MERGED cluster view for
> channel hibernate: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,521 WARN  [org.jgroups.protocols.pbcast.NAKACK]
> (Incoming-19,ee,ip-B) JGRP000011: ip-B: dropped message 54375 from
> non-member ip-C (view=[ip-A|3] (2) [ip-A, ip-B]) Warning
>  05:09:33,527 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-20,ee,ip-B) ISPN000093: Received new, MERGED cluster view for
> channel server: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,529 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-20,ee,ip-B) ISPN000093: Received new, MERGED cluster view for
> channel keycloak: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,529 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-20,ee,ip-B) ISPN000093: Received new, MERGED cluster view for
> channel web: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,530 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-20,ee,ip-B) ISPN000093: Received new, MERGED cluster view for
> channel ejb: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,533 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-20,ee,ip-B) ISPN000093: Received new, MERGED cluster view for
> channel hibernate: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,518 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-7,ee,ip-A) ISPN000093: Received new, MERGED cluster view for
> channel server: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,525 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-7,ee,ip-A) ISPN000093: Received new, MERGED cluster view for
> channel keycloak: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,525 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-7,ee,ip-A) ISPN000093: Received new, MERGED cluster view for
> channel web: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,527 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-7,ee,ip-A) ISPN000093: Received new, MERGED cluster view for
> channel hibernate: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2
> subgroups: [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,527 INFO
> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
> (Incoming-7,ee,ip-A) ISPN000093: Received new, MERGED cluster view for
> channel ejb: MergeView::[ip-A|4] (3) [ip-A, ip-B, ip-C], 2 subgroups:
> [ip-A|2] (3) [ip-A, ip-C, ip-B], [ip-A|3] (2) [ip-A, ip-B]
>  05:09:33,577 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t12)
> ISPN000310: Starting cluster-wide rebalance for cache authorization,
> topology CacheTopology{id=10, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (3)[ip-A: 27+53, ip-C:
> 27+53, ip-B: 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners =
> (3)[ip-A: 27+53, ip-B: 26+54, ip-C: 27+53]}, unionCH=null,
> actualMembers=[ip-A, ip-B, ip-C]}
>  05:09:33,579 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t12)
> ISPN000310: Starting cluster-wide rebalance for cache offlineSessions,
> topology CacheTopology{id=10, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (3)[ip-A: 27+53, ip-C:
> 27+53, ip-B: 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners =
> (3)[ip-A: 27+53, ip-B: 26+54, ip-C: 27+53]}, unionCH=null,
> actualMembers=[ip-A, ip-B, ip-C]}
>  05:09:33,580 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t12)
> ISPN000310: Starting cluster-wide rebalance for cache loginFailures,
> topology CacheTopology{id=10, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (3)[ip-A: 27+53, ip-C:
> 27+53, ip-B: 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners =
> (3)[ip-A: 27+53, ip-B: 26+54, ip-C: 27+53]}, unionCH=null,
> actualMembers=[ip-A, ip-B, ip-C]}
>  05:09:33,582 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t15)
> ISPN000310: Starting cluster-wide rebalance for cache sessions, topology
> CacheTopology{id=10, rebalanceId=3,
> currentCH=DefaultConsistentHash{ns=80, owners = (3)[ip-A: 27+53, ip-C:
> 27+53, ip-B: 26+54]}, pendingCH=DefaultConsistentHash{ns=80, owners =
> (3)[ip-A: 27+53, ip-B: 26+54, ip-C: 27+53]}, unionCH=null,
> actualMembers=[ip-A, ip-B, ip-C]}
>  05:09:33,589 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t130)
> ISPN000336: Finished cluster-wide rebalance for cache loginFailures,
> topology id = 10
>  05:09:33,589 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t130)
> ISPN000336: Finished cluster-wide rebalance for cache authorization,
> topology id = 10
>  05:09:33,590 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t131)
> ISPN000336: Finished cluster-wide rebalance for cache offlineSessions,
> topology id = 10
>  05:09:33,601 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t133)
> ISPN000336: Finished cluster-wide rebalance for cache sessions, topology
> id = 10
>  05:09:34,001 ERROR [org.jgroups.protocols.TCP] (Connection.Receiver
> [IP-A:35361 - IP-C:7600],ee,ip-A) ip-A: failed submitting DONT_BUNDLE
> message to thread pool: java.util.concurrent.RejectedExecutionException:
> Task org.jgroups.protocols.TP$SingleMessageHandler at 17c86a0c rejected
> from java.util.concurrent.ThreadPoolExecutor at 52cae691[Running, pool size
> = 4, active threads = 4, queued tasks = 100, completed tasks = 1589693].
> Msg: RequestCorrelator: id=200, type=REQ, id=607, rsp_expected=true,
> FORK: ee:keycloak, NAKACK: [XMIT_RSP, seqno=206], TCP: [cluster_name=ee]
> ERROR
>  05:09:34,010 ERROR [org.jgroups.protocols.TCP] (Connection.Receiver
> [IP-A:35361 - IP-C:7600],ee,ip-A) ip-A: failed submitting DONT_BUNDLE
> message to thread pool: java.util.concurrent.RejectedExecutionException:
> Task org.jgroups.protocols.TP$SingleMessageHandler at 4fec3655 rejected
> from java.util.concurrent.ThreadPoolExecutor at 52cae691[Running, pool size
> = 4, active threads = 4, queued tasks = 100, completed tasks = 1589694].
> Msg: RequestCorrelator: id=200, type=REQ, id=609, rsp_expected=true,
> FORK: ee:keycloak, NAKACK: [XMIT_RSP, seqno=208], TCP: [cluster_name=ee]
> ERROR
>
> (last line repeated many, many times)
> ---------------------->%-----------------------
>
>
>
> Infinispan subsystem configuration in standalone-ha.xml:
>
> ---------------------->%-----------------------
>     <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
>       <cache-container name="keycloak" jndi-name="infinispan/Keycloak">
>         <transport lock-timeout="60000"/>
>         <local-cache name="realms">
>           <eviction max-entries="10000" strategy="LRU"/>
>         </local-cache>
>         <local-cache name="users">
>           <eviction max-entries="10000" strategy="LRU"/>
>         </local-cache>
>         <distributed-cache name="sessions" mode="SYNC" owners="3"/>
>         <distributed-cache name="offlineSessions" mode="SYNC" owners="3"/>
>         <distributed-cache name="loginFailures" mode="SYNC" owners="3"/>
>         <distributed-cache name="authorization" mode="SYNC" owners="3"/>
>         <replicated-cache name="work" mode="SYNC"/>
>         <local-cache name="keys">
>           <eviction max-entries="1000" strategy="LRU"/>
>           <expiration max-idle="3600000"/>
>         </local-cache>
>       </cache-container>
>       <cache-container name="server" aliases="singleton cluster"
> default-cache="default" module="org.wildfly.clustering.server">
>         <transport lock-timeout="60000"/>
>         <replicated-cache name="default" mode="SYNC">
>           <transaction mode="BATCH"/>
>         </replicated-cache>
>       </cache-container>
>       <cache-container name="web" default-cache="dist"
> module="org.wildfly.clustering.web.infinispan">
>         <transport lock-timeout="60000"/>
>         <distributed-cache name="dist" mode="ASYNC" l1-lifespan="0"
> owners="2">
>           <locking isolation="REPEATABLE_READ"/>
>           <transaction mode="BATCH"/>
>           <file-store/>
>         </distributed-cache>
>       </cache-container>
>       <cache-container name="ejb" aliases="sfsb" default-cache="dist"
> module="org.wildfly.clustering.ejb.infinispan">
>         <transport lock-timeout="60000"/>
>         <distributed-cache name="dist" mode="ASYNC" l1-lifespan="0"
> owners="2">
>           <locking isolation="REPEATABLE_READ"/>
>           <transaction mode="BATCH"/>
>           <file-store/>
>         </distributed-cache>
>       </cache-container>
>       <cache-container name="hibernate" default-cache="local-query"
> module="org.hibernate.infinispan">
>         <transport lock-timeout="60000"/>
>         <local-cache name="local-query">
>           <eviction strategy="LRU" max-entries="10000"/>
>           <expiration max-idle="100000"/>
>         </local-cache>
>         <invalidation-cache name="entity" mode="SYNC">
>           <transaction mode="NON_XA"/>
>           <eviction strategy="LRU" max-entries="10000"/>
>           <expiration max-idle="100000"/>
>         </invalidation-cache>
>         <replicated-cache name="timestamps" mode="ASYNC"/>
>       </cache-container>
>     </subsystem>
> ---------------------->%-----------------------
>
> JGroups subsystem configuration:
>
> ---------------------->%-----------------------
>     <subsystem xmlns="urn:jboss:domain:jgroups:4.0" default-stack="tcp">
>       <channels default="ee">
>         <channel name="ee" stack="tcp"/>
>       </channels>
>       <stacks>
>         <stack name="tcp">
>           <transport type="TCP" socket-binding="jgroups-tcp"/>
>           <protocol type="S3_PING">
>             <property name="access_key">S3AccessKey</property>
>             <property name="secret_access_key">S3SecretAccessKey</property>
>             <property name="location">S3PingBucketName</property>
>           </protocol>
>           <protocol type="MERGE2"/>
>           <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
>           <protocol type="FD"/>
>           <protocol type="VERIFY_SUSPECT"/>
>           <protocol type="pbcast.NAKACK"/>
>           <protocol type="UNICAST2"/>
>           <protocol type="pbcast.STABLE"/>
>           <protocol type="pbcast.GMS"/>
>           <protocol type="UFC"/>
>           <protocol type="MFC"/>
>           <protocol type="FRAG2"/>
>           <protocol type="RSVP"/>
>         </stack>
>         <stack name="udp">
>           <transport type="UDP" socket-binding="jgroups-udp"/>
>           <protocol type="PING"/>
>           <protocol type="MERGE3"/>
>           <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
>           <protocol type="FD_ALL"/>
>           <protocol type="VERIFY_SUSPECT"/>
>           <protocol type="pbcast.NAKACK2"/>
>           <protocol type="UNICAST3"/>
>           <protocol type="pbcast.STABLE"/>
>           <protocol type="pbcast.GMS"/>
>           <protocol type="UFC"/>
>           <protocol type="MFC"/>
>           <protocol type="FRAG2"/>
>         </stack>
>       </stacks>
>     </subsystem>
> ---------------------->%-----------------------
>
> with kind regards,
>
> Edwin de Jong
>
> -- Simacan B.V. Data Engineer
> _______________________________________________
> keycloak-user mailing list
> keycloak-user at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-user



-- 

--Hynek


More information about the keycloak-user mailing list