[keycloak-user] OOM at startup with 1 million sessions

William Burns wburns at redhat.com
Tue Nov 20 10:33:02 EST 2018



----- Original Message -----
> From: "Nicolas Ocquidant" <nocquidant at gmail.com>
> To: mposolda at redhat.com
> Cc: slaskawi at redhat.com, wburns at redhat.com, keycloak-user at lists.jboss.org
> Sent: Tuesday, November 20, 2018 8:41:58 AM
> Subject: Re: [keycloak-user] OOM at startup with 1 million sessions
> 
> Actually, the OOM does not come right after remoteCache#size, and maybe it
> consumes up to 3GB because I have enough heap space (my laptop has 16GB
> RAM)...
> 
> HeapDumpOnOutOfMemory from the ISPN process shows that the heap is full of
> byte[3][] objects (4 threads):
> https://gist.github.com/nocquidant/4e84eb8d704b56221a9865cb63951734

Okay, I did some digging and it looks like the fetch size is ignored when running the select against postgres. The cause is postgres doesn't do paging unless auto-commit is disabled on the connection [1]. This is causing all the entries to be in memory at once. This will be fixed in [2].

Also a side note for keycloak, I would recommend adding some backpressure support at [3] - if the puts are too slow you could end up with a large amount of entries in memory at the same time.

[1] https://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
[2] https://issues.jboss.org/browse/ISPN-9752
[3] https://github.com/keycloak/keycloak/blob/4.6.0.Final/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/remotestore/RemoteCacheSessionsLoader.java#L111

> 
> According to me, OOM occurs in the following method (KC side), when getting
> from the future:
> 
>     protected void startLoadingImpl(InitializerState state,
> SessionLoader.LoaderContext ctx) {
>         // Assume each worker has same processor's count
>         int processors = Runtime.getRuntime().availableProcessors();
> 
>         ExecutorService localExecutor = Executors.newCachedThreadPool();
>         Transport transport = workCache.getCacheManager().getTransport();
>         boolean distributed = transport != null;
>         ExecutorService executorService = distributed ? new
> DefaultExecutorService(workCache, localExecutor) : localExecutor;
> 
>         int errors = 0;
> 
>         try {
>             while (!state.isFinished()) {
>                 int nodesCount = transport==null ? 1 :
> transport.getMembers().size();
>                 int distributedWorkersCount = processors * nodesCount;
> 
>                 log.debugf("Starting next iteration with %d workers",
> distributedWorkersCount);
> 
>                 List<Integer> segments =
> state.getUnfinishedSegments(distributedWorkersCount);
> 
>                 if (log.isTraceEnabled()) {
>                     log.trace("unfinished segments for this iteration: " +
> segments);
>                 }
> 
>                 List<Future<WorkerResult>> futures = new LinkedList<>();
>                 for (Integer segment : segments) {
>                     SessionInitializerWorker worker = new
> SessionInitializerWorker();
>                     worker.setWorkerEnvironment(segment, ctx,
> sessionLoader);
>                     if (!distributed) {
>                         worker.setEnvironment(workCache, null);
>                     }
> 
>                     Future<WorkerResult> future =
> executorService.submit(worker);
>                     futures.add(future);
>                 }
> 
>                 for (Future<WorkerResult> future : futures) {
>                     try {
>                         WorkerResult result = future.get(); <--- HERE
> 
>                         if (result.getSuccess()) {
>                             int computedSegment = result.getSegment();
>                             state.markSegmentFinished(computedSegment);
>                         } else {
>                             if (log.isTraceEnabled()) {
>                                 log.tracef("Segment %d failed to compute",
> result.getSegment());
>                             }
>                         }
>                     } catch (InterruptedException ie) {
>                         errors++;
>                         log.error("Interruped exception when computed
> future. Errors: " + errors, ie);
>                     } catch (ExecutionException ee) {
>                         errors++;
>                         log.error("ExecutionException when computed future.
> Errors: " + errors, ee);
>                     }
>                 }
> 
>                 if (errors >= maxErrors) {
>                     throw new RuntimeException("Maximum count of worker
> errors occured. Limit was " + maxErrors + ". See server.log for details");
>                 }
> 
>                 saveStateToCache(state);
> 
>                 log.debugf("New initializer state pushed. The state is:
> %s", state);
>             }
> 
>             // Loader callback after the task is finished
>             this.sessionLoader.afterAllSessionsLoaded(this);
> 
>         } finally {
>             if (distributed) {
>                 executorService.shutdown();
>             }
>             localExecutor.shutdown();
>         }
>     }
> 
> 
> 
> 
> Le mar. 20 nov. 2018 à 14:11, Marek Posolda <mposolda at redhat.com> a écrit :
> 
> > On 20/11/2018 13:42, Sebastian Laskawiec wrote:
> >
> > If I'm not mistaken (+William Burns <wburns at redhat.com> is an expert
> > here), invoking RemoteCache#size() method requires to pull all entries from
> > the store into the memory. If so, that's why it blows up.

To confirm Infinispan iteration only pulls a batch at a time. In 9.4 we are using rxjava2 Publisher which batches 128 entries at a time. Unfortunately, the fetch size on the JDBC connection alone was not enough to have the JDBC driver batch though :P

> >
> > The only way that comes into my mind is to get rid of RemoteCache#size()
> > calls from Keycloak (+Marek Posolda <mposolda at redhat.com> WDYT, is it
> > doable?).
> >
> > Hmm... I think that remoteCache#size doesn't need to load the entries to
> > the memory? I see that HotRod protocol has "size" operation and that one
> > seem to be called under the covers when you call RemoteCache#size (on
> > client side, the Java class is
> > org.infinispan.client.hotrod.impl.operations.SizeOperation ).
> >
> > I am not sure how much memory is needed to have 1 million of user
> > sessions. We plan to have some tests for this, but didn't test yet this
> > with the cross-dc enabled. Do you have both JDG and Keycloak on same
> > machine? Maybe it will help to increase or decrease the value of "Xmx" for
> > both JDG and Keycloak. AFAIK if it's too big, it means that garbage
> > collectors are not called very often and new records are added to the
> > memory and JDK process doesn't have enough space to handle it. But not 100%
> > sure if that can help.
> >
> > Marek
> >
> >
> > I'm just wondering if adding more memory to the ISPN process is an option
> > in your case?
> >
> > On Tue, Nov 20, 2018 at 11:12 AM Nicolas Ocquidant <nocquidant at gmail.com>
> > wrote:
> >
> >> Hi
> >>
> >> My UC is to keep one year of sessions, this is what I have done to
> >> simulate
> >> it:
> >>
> >> 1. I use JDBC to store 1 million of session objects, 2.8KB (in memory)
> >> each
> >> 2. I start one Infinispan node with passivation=false and shared=true, and
> >> Xmx=8G
> >> 3. I start one Keycloak node configured with a remote-cache and Xmx=4G
> >>
> >> Note that I use a remote-store in KC as it is the only way to set
> >> passivation=false and shared=true (see
> >> http://lists.jboss.org/pipermail/keycloak-user/2018-November/016180.html
> >> ).
> >>
> >> No problem in step 2, ISPN process is less than 300MB large in memory. But
> >> after 3, ISPN process goes up to around 6GB.
> >>
> >> See below for the traces, but basically I get OOM.
> >>
> >> Using the debugger, I can see that getting the size of the cache is a
> >> really slow operation, and bump memory to 3GB in ISPN process.
> >>
> >> From
> >>
> >> org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer:
> >>
> >> RemoteCacheSessionsLoader.computeLoaderContext(KeycloakSession session)
> >>    // ...
> >>     int sessionsTotal = remoteCache.size(); <--- HERE
> >>    //...
> >> }
> >>
> >> And then (OOM here):
> >>
> >> InfinispanCacheInitializer.startLoadingImpl(InitializerState state,
> >> SessionLoader.LoaderContext ctx) {
> >>   // ...
> >>    for (Future<WorkerResult> future : futures) {
> >>      // Called 4X (ie the number of segments), but 1st one does not
> >> terminate: OOM
> >>      // ->
> >> org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart
> >>     WorkerResult result = future.get(); <--- Very slow and bump mem to 8GB
> >>      // ...
> >> }
> >>
> >> So, with 1M of sessions in store, I cannot get KC/ISPN to start. And I am
> >> far from my goal which is to keep one year of sessions (which has been
> >> estimated to ~52M of sessions)...
> >>
> >> Is it something I can't achieve with KC/ISPN?
> >> Any help appreciated.
> >>
> >> Thanks
> >> nick
> >>
> >> Note, versions I used are:
> >>
> >> * ISPN 9.4.1 Wildfly
> >> * KC 4.6.0 Wildfly
> >>
> >> -- ISPN process
> >>
> >> 10:42:22,969 ERROR [stderr] (Periodic Recovery) Exception in thread
> >> "Periodic Recovery" java.lang.OutOfMemoryError: Java heap space
> >> 10:42:22,977 ERROR [stderr] (Periodic Recovery)         at
> >> sun.text.resources.fr.FormatData_fr.getContents(FormatData_fr.java:86)
> >> 10:42:22,975 ERROR
> >>
> >> [org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory]
> >> (pool-9-thread-1) ISPN008018: Sql failure retrieving connection from
> >> datasource: java.sql.SQLException: javax.resource.ResourceException:
> >> IJ000456: Unchecked throwable in ManagedConnection.getConnection()
> >>
> >> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener at 3e3890ff
> >> [state=NORMAL
> >> managed
> >>
> >> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection at 3f033728
> >> connection handles=0 lastReturned=1542706942971
> >> lastValidated=1542706738387
> >> lastCheckedOut=1542706916019 trackByTx=false
> >> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool at 2772dcff
> >> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool at 5e82b87d
> >> [pool=InfinispanDS]
> >> xaResource=LocalXAResourceImpl at 308d2c98[connectionListener=3e3890ff
> >> connectionManager=1691f7d6 warned=false currentXid=null
> >> productName=PostgreSQL productVersion=10.5
> >> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
> >>         at
> >>
> >> org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146)
> >>         at
> >>
> >> org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
> >>         at
> >>
> >> org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory.getConnection(ManagedConnectionFactory.java:83)
> >>         at
> >>
> >> org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.purge(JdbcStringBasedStore.java:461)
> >>         at
> >>
> >> org.infinispan.persistence.manager.PersistenceManagerImpl.lambda$purgeExpired$6(PersistenceManagerImpl.java:459)
> >>         at java.util.ArrayList.forEach(ArrayList.java:1257)
> >>         at
> >>
> >> org.infinispan.persistence.manager.PersistenceManagerImpl.purgeExpired(PersistenceManagerImpl.java:462)
> >>         at
> >>
> >> org.infinispan.expiration.impl.ClusterExpirationManager.processExpiration(ClusterExpirationManager.java:119)
> >>         at
> >>
> >> org.infinispan.expiration.impl.ExpirationManagerImpl$ScheduledTask.run(ExpirationManagerImpl.java:245)
> >>         at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>         at
> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >>         at
> >>
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >>         at
> >>
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >>         at
> >>
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>         at
> >>
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>         at java.lang.Thread.run(Thread.java:748)
> >> Caused by: javax.resource.ResourceException: IJ000456: Unchecked throwable
> >> in ManagedConnection.getConnection()
> >>
> >> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener at 3e3890ff
> >> [state=NORMAL
> >> managed
> >>
> >> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection at 3f033728
> >> connection handles=0 lastReturned=1542706942971
> >> lastValidated=1542706738387
> >> lastCheckedOut=1542706916019 trackByTx=false
> >> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool at 2772dcff
> >> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool at 5e82b87d
> >> [pool=InfinispanDS]
> >> xaResource=LocalXAResourceImpl at 308d2c98[connectionListener=3e3890ff
> >> connectionManager=1691f7d6 warned=false currentXid=null
> >> productName=PostgreSQL productVersion=10.5
> >> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
> >>         at
> >>
> >> org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:811)
> >>         at
> >>
> >> org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:138)
> >>         ... 15 more
> >> Caused by: java.lang.OutOfMemoryError: Java heap space
> >>
> >>
> >> -- KC process
> >>
> >> 10:42:23,216 WARN  [org.infinispan.client.hotrod.impl.protocol.Codec21]
> >> (Thread-0) ISPN004005: Error received from the server:
> >> java.lang.RuntimeException: java.sql.SQLException: Error
> >>
> >> java.sql.SQLException: Error
> >> java.lang.OutOfMemoryError: Java heap space
> >> 10:42:23,359 WARN
> >>
> >> [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionsLoader]
> >> (pool-16-thread-4) Error loading sessions from remote cache 'sessions' for
> >> segment '3':
> >> org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for
> >> messageId=55 returned server error (status=0x85):
> >> java.lang.RuntimeException: java.sql.SQLException: Error
> >> java.sql.SQLException: Error
> >> java.lang.OutOfMemoryError: Java heap space
> >>         at
> >>
> >> org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:333)
> >>         at
> >>
> >> org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:179)
> >>         at
> >>
> >> org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:138)
> >>         at
> >>
> >> org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:98)
> >>         at
> >>
> >> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> >>         at
> >>
> >> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> >>         at
> >>
> >> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> >>         at
> >>
> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> >>         at
> >>
> >> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> >>         at
> >>
> >> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> >>         at
> >>
> >> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:647)
> >>         at
> >>
> >> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:582)
> >>         at
> >>
> >> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
> >>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:461)
> >>         at
> >>
> >> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> >>         at
> >>
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>         at
> >>
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>         at java.lang.Thread.run(Thread.java:748)
> >> _______________________________________________
> >> keycloak-user mailing list
> >> keycloak-user at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/keycloak-user
> >>
> >
> >
> 



More information about the keycloak-user mailing list