Re: [keycloak-user] OOM at startup with 1 million sessions

Tuesday, 20 November 2018

Actually, the OOM does not come right after remoteCache#size, and maybe it
consumes up to 3GB because I have enough heap space (my laptop has 16GB
RAM)...

HeapDumpOnOutOfMemory from the ISPN process shows that the heap is full of
byte[3][] objects (4 threads):
https://gist.github.com/nocquidant/4e84eb8d704b56221a9865cb63951734

According to me, OOM occurs in the following method (KC side), when getting
from the future:

    protected void startLoadingImpl(InitializerState state,
SessionLoader.LoaderContext ctx) {
        // Assume each worker has same processor's count
        int processors = Runtime.getRuntime().availableProcessors();

        ExecutorService localExecutor = Executors.newCachedThreadPool();
        Transport transport = workCache.getCacheManager().getTransport();
        boolean distributed = transport != null;
        ExecutorService executorService = distributed ? new
DefaultExecutorService(workCache, localExecutor) : localExecutor;

        int errors = 0;

        try {
            while (!state.isFinished()) {
                int nodesCount = transport==null ? 1 :
transport.getMembers().size();
                int distributedWorkersCount = processors * nodesCount;

                log.debugf("Starting next iteration with %d workers",
distributedWorkersCount);

                List<Integer> segments =
state.getUnfinishedSegments(distributedWorkersCount);

                if (log.isTraceEnabled()) {
                    log.trace("unfinished segments for this iteration: " +
segments);
                }

                List<Future<WorkerResult>> futures = new
LinkedList<>();
                for (Integer segment : segments) {
                    SessionInitializerWorker worker = new
SessionInitializerWorker();
                    worker.setWorkerEnvironment(segment, ctx,
sessionLoader);
                    if (!distributed) {
                        worker.setEnvironment(workCache, null);
                    }

                    Future<WorkerResult> future =
executorService.submit(worker);
                    futures.add(future);
                }

                for (Future<WorkerResult> future : futures) {
                    try {
                        WorkerResult result = future.get(); <--- HERE

                        if (result.getSuccess()) {
                            int computedSegment = result.getSegment();
                            state.markSegmentFinished(computedSegment);
                        } else {
                            if (log.isTraceEnabled()) {
                                log.tracef("Segment %d failed to compute",
result.getSegment());
                            }
                        }
                    } catch (InterruptedException ie) {
                        errors++;
                        log.error("Interruped exception when computed
future. Errors: " + errors, ie);
                    } catch (ExecutionException ee) {
                        errors++;
                        log.error("ExecutionException when computed future.
Errors: " + errors, ee);
                    }
                }

                if (errors >= maxErrors) {
                    throw new RuntimeException("Maximum count of worker
errors occured. Limit was " + maxErrors + ". See server.log for details");
                }

                saveStateToCache(state);

                log.debugf("New initializer state pushed. The state is:
%s", state);
            }

            // Loader callback after the task is finished
            this.sessionLoader.afterAllSessionsLoaded(this);

        } finally {
            if (distributed) {
                executorService.shutdown();
            }
            localExecutor.shutdown();
        }
    }

Le mar. 20 nov. 2018 à 14:11, Marek Posolda <mposolda(a)redhat.com&gt; a écrit :

...
 On 20/11/2018 13:42, Sebastian Laskawiec wrote:

 If I'm not mistaken (+William Burns <wburns(a)redhat.com&gt; is an expert
 here), invoking RemoteCache#size() method requires to pull all entries from
 the store into the memory. If so, that's why it blows up.

 The only way that comes into my mind is to get rid of RemoteCache#size()
 calls from Keycloak (+Marek Posolda <mposolda(a)redhat.com&gt; WDYT, is it
 doable?).

 Hmm... I think that remoteCache#size doesn't need to load the entries to
 the memory? I see that HotRod protocol has "size" operation and that one
 seem to be called under the covers when you call RemoteCache#size (on
 client side, the Java class is
 org.infinispan.client.hotrod.impl.operations.SizeOperation ).

 I am not sure how much memory is needed to have 1 million of user
 sessions. We plan to have some tests for this, but didn't test yet this
 with the cross-dc enabled. Do you have both JDG and Keycloak on same
 machine? Maybe it will help to increase or decrease the value of "Xmx" for
 both JDG and Keycloak. AFAIK if it's too big, it means that garbage
 collectors are not called very often and new records are added to the
 memory and JDK process doesn't have enough space to handle it. But not 100%
 sure if that can help.

 Marek

 I'm just wondering if adding more memory to the ISPN process is an option
 in your case?

 On Tue, Nov 20, 2018 at 11:12 AM Nicolas Ocquidant <nocquidant(a)gmail.com&gt;
 wrote:

> Hi
>
> My UC is to keep one year of sessions, this is what I have done to
> simulate
> it:
>
> 1. I use JDBC to store 1 million of session objects, 2.8KB (in memory)
> each
> 2. I start one Infinispan node with passivation=false and shared=true, and
> Xmx=8G
> 3. I start one Keycloak node configured with a remote-cache and Xmx=4G
>
> Note that I use a remote-store in KC as it is the only way to set
> passivation=false and shared=true (see
> http://lists.jboss.org/pipermail/keycloak-user/2018-November/016180.html
> ).
>
> No problem in step 2, ISPN process is less than 300MB large in memory. But
> after 3, ISPN process goes up to around 6GB.
>
> See below for the traces, but basically I get OOM.
>
> Using the debugger, I can see that getting the size of the cache is a
> really slow operation, and bump memory to 3GB in ISPN process.
>
> From
>
> org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer:
>
> RemoteCacheSessionsLoader.computeLoaderContext(KeycloakSession session)
>    // ...
>     int sessionsTotal = remoteCache.size(); <--- HERE
>    //...
> }
>
> And then (OOM here):
>
> InfinispanCacheInitializer.startLoadingImpl(InitializerState state,
> SessionLoader.LoaderContext ctx) {
>   // ...
>    for (Future<WorkerResult> future : futures) {
>      // Called 4X (ie the number of segments), but 1st one does not
> terminate: OOM
>      // ->
> org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart
>     WorkerResult result = future.get(); <--- Very slow and bump mem to 8GB
>      // ...
> }
>
> So, with 1M of sessions in store, I cannot get KC/ISPN to start. And I am
> far from my goal which is to keep one year of sessions (which has been
> estimated to ~52M of sessions)...
>
> Is it something I can't achieve with KC/ISPN?
> Any help appreciated.
>
> Thanks
> nick
>
> Note, versions I used are:
>
> * ISPN 9.4.1 Wildfly
> * KC 4.6.0 Wildfly
>
> -- ISPN process
>
> 10:42:22,969 ERROR [stderr] (Periodic Recovery) Exception in thread
> "Periodic Recovery" java.lang.OutOfMemoryError: Java heap space
> 10:42:22,977 ERROR [stderr] (Periodic Recovery)         at
> sun.text.resources.fr.FormatData_fr.getContents(FormatData_fr.java:86)
> 10:42:22,975 ERROR
>
> [org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory]
> (pool-9-thread-1) ISPN008018: Sql failure retrieving connection from
> datasource: java.sql.SQLException: javax.resource.ResourceException:
> IJ000456: Unchecked throwable in ManagedConnection.getConnection()
>
> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3e3890ff
> [state=NORMAL
> managed
>
> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3f033728
> connection handles=0 lastReturned=1542706942971
> lastValidated=1542706738387
> lastCheckedOut=1542706916019 trackByTx=false
> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@2772dcff
> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@5e82b87d
> [pool=InfinispanDS]
> xaResource=LocalXAResourceImpl@308d2c98[connectionListener=3e3890ff
> connectionManager=1691f7d6 warned=false currentXid=null
> productName=PostgreSQL productVersion=10.5
> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>         at
>
>
org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146)
>         at
>
>
org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
>         at
>
>
org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory.getConnection(ManagedConnectionFactory.java:83)
>         at
>
>
org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.purge(JdbcStringBasedStore.java:461)
>         at
>
>
org.infinispan.persistence.manager.PersistenceManagerImpl.lambda$purgeExpired$6(PersistenceManagerImpl.java:459)
>         at java.util.ArrayList.forEach(ArrayList.java:1257)
>         at
>
>
org.infinispan.persistence.manager.PersistenceManagerImpl.purgeExpired(PersistenceManagerImpl.java:462)
>         at
>
>
org.infinispan.expiration.impl.ClusterExpirationManager.processExpiration(ClusterExpirationManager.java:119)
>         at
>
>
org.infinispan.expiration.impl.ExpirationManagerImpl$ScheduledTask.run(ExpirationManagerImpl.java:245)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
>
>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
>
>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.resource.ResourceException: IJ000456: Unchecked throwable
> in ManagedConnection.getConnection()
>
> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3e3890ff
> [state=NORMAL
> managed
>
> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3f033728
> connection handles=0 lastReturned=1542706942971
> lastValidated=1542706738387
> lastCheckedOut=1542706916019 trackByTx=false
> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@2772dcff
> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@5e82b87d
> [pool=InfinispanDS]
> xaResource=LocalXAResourceImpl@308d2c98[connectionListener=3e3890ff
> connectionManager=1691f7d6 warned=false currentXid=null
> productName=PostgreSQL productVersion=10.5
> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>         at
>
>
org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:811)
>         at
>
>
org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:138)
>         ... 15 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>
> -- KC process
>
> 10:42:23,216 WARN  [org.infinispan.client.hotrod.impl.protocol.Codec21]
> (Thread-0) ISPN004005: Error received from the server:
> java.lang.RuntimeException: java.sql.SQLException: Error
>
> java.sql.SQLException: Error
> java.lang.OutOfMemoryError: Java heap space
> 10:42:23,359 WARN
>
> [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionsLoader]
> (pool-16-thread-4) Error loading sessions from remote cache 'sessions' for
> segment '3':
> org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for
> messageId=55 returned server error (status=0x85):
> java.lang.RuntimeException: java.sql.SQLException: Error
> java.sql.SQLException: Error
> java.lang.OutOfMemoryError: Java heap space
>         at
>
>
org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:333)
>         at
>
> org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:179)
>         at
>
>
org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:138)
>         at
>
>
org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:98)
>         at
>
>
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>         at
>
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>         at
>
>
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at
>
>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at
>
>
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>         at
>
>
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
>         at
>
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:647)
>         at
>
>
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:582)
>         at
>
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:461)
>         at
>
>
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> _______________________________________________
> keycloak-user mailing list
> keycloak-user(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-user
>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Re: [keycloak-user] OOM at startup with 1 million sessions