[keycloak-user] OOM at startup with 1 million sessions

Tue Nov 20 08:41:58 EST 2018

Actually, the OOM does not come right after remoteCache#size, and maybe it
consumes up to 3GB because I have enough heap space (my laptop has 16GB
RAM)...

HeapDumpOnOutOfMemory from the ISPN process shows that the heap is full of
byte[3][] objects (4 threads):
https://gist.github.com/nocquidant/4e84eb8d704b56221a9865cb63951734

According to me, OOM occurs in the following method (KC side), when getting
from the future:

    protected void startLoadingImpl(InitializerState state,
SessionLoader.LoaderContext ctx) {
        // Assume each worker has same processor's count
        int processors = Runtime.getRuntime().availableProcessors();

        ExecutorService localExecutor = Executors.newCachedThreadPool();
        Transport transport = workCache.getCacheManager().getTransport();
        boolean distributed = transport != null;
        ExecutorService executorService = distributed ? new
DefaultExecutorService(workCache, localExecutor) : localExecutor;

        int errors = 0;

        try {
            while (!state.isFinished()) {
                int nodesCount = transport==null ? 1 :
transport.getMembers().size();
                int distributedWorkersCount = processors * nodesCount;

                log.debugf("Starting next iteration with %d workers",
distributedWorkersCount);

                List<Integer> segments =
state.getUnfinishedSegments(distributedWorkersCount);

                if (log.isTraceEnabled()) {
                    log.trace("unfinished segments for this iteration: " +
segments);
                }

                List<Future<WorkerResult>> futures = new LinkedList<>();
                for (Integer segment : segments) {
                    SessionInitializerWorker worker = new
SessionInitializerWorker();
                    worker.setWorkerEnvironment(segment, ctx,
sessionLoader);
                    if (!distributed) {
                        worker.setEnvironment(workCache, null);
                    }

                    Future<WorkerResult> future =
executorService.submit(worker);
                    futures.add(future);
                }

                for (Future<WorkerResult> future : futures) {
                    try {
                        WorkerResult result = future.get(); <--- HERE

                        if (result.getSuccess()) {
                            int computedSegment = result.getSegment();
                            state.markSegmentFinished(computedSegment);
                        } else {
                            if (log.isTraceEnabled()) {
                                log.tracef("Segment %d failed to compute",
result.getSegment());
                            }
                        }
                    } catch (InterruptedException ie) {
                        errors++;
                        log.error("Interruped exception when computed
future. Errors: " + errors, ie);
                    } catch (ExecutionException ee) {
                        errors++;
                        log.error("ExecutionException when computed future.
Errors: " + errors, ee);
                    }
                }

                if (errors >= maxErrors) {
                    throw new RuntimeException("Maximum count of worker
errors occured. Limit was " + maxErrors + ". See server.log for details");
                }

                saveStateToCache(state);

                log.debugf("New initializer state pushed. The state is:
%s", state);
            }

            // Loader callback after the task is finished
            this.sessionLoader.afterAllSessionsLoaded(this);

        } finally {
            if (distributed) {
                executorService.shutdown();
            }
            localExecutor.shutdown();
        }
    }

Le mar. 20 nov. 2018 à 14:11, Marek Posolda <mposolda at redhat.com> a écrit :

> On 20/11/2018 13:42, Sebastian Laskawiec wrote:
>
> If I'm not mistaken (+William Burns <wburns at redhat.com> is an expert
> here), invoking RemoteCache#size() method requires to pull all entries from
> the store into the memory. If so, that's why it blows up.
>
> The only way that comes into my mind is to get rid of RemoteCache#size()
> calls from Keycloak (+Marek Posolda <mposolda at redhat.com> WDYT, is it
> doable?).
>
> Hmm... I think that remoteCache#size doesn't need to load the entries to
> the memory? I see that HotRod protocol has "size" operation and that one
> seem to be called under the covers when you call RemoteCache#size (on
> client side, the Java class is
> org.infinispan.client.hotrod.impl.operations.SizeOperation ).
>
> I am not sure how much memory is needed to have 1 million of user
> sessions. We plan to have some tests for this, but didn't test yet this
> with the cross-dc enabled. Do you have both JDG and Keycloak on same
> machine? Maybe it will help to increase or decrease the value of "Xmx" for
> both JDG and Keycloak. AFAIK if it's too big, it means that garbage
> collectors are not called very often and new records are added to the
> memory and JDK process doesn't have enough space to handle it. But not 100%
> sure if that can help.
>
> Marek
>
>
> I'm just wondering if adding more memory to the ISPN process is an option
> in your case?
>
> On Tue, Nov 20, 2018 at 11:12 AM Nicolas Ocquidant <nocquidant at gmail.com>
> wrote:
>
>> Hi
>>
>> My UC is to keep one year of sessions, this is what I have done to
>> simulate
>> it:
>>
>> 1. I use JDBC to store 1 million of session objects, 2.8KB (in memory)
>> each
>> 2. I start one Infinispan node with passivation=false and shared=true, and
>> Xmx=8G
>> 3. I start one Keycloak node configured with a remote-cache and Xmx=4G
>>
>> Note that I use a remote-store in KC as it is the only way to set
>> passivation=false and shared=true (see
>> http://lists.jboss.org/pipermail/keycloak-user/2018-November/016180.html
>> ).
>>
>> No problem in step 2, ISPN process is less than 300MB large in memory. But
>> after 3, ISPN process goes up to around 6GB.
>>
>> See below for the traces, but basically I get OOM.
>>
>> Using the debugger, I can see that getting the size of the cache is a
>> really slow operation, and bump memory to 3GB in ISPN process.
>>
>> From
>>
>> org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer:
>>
>> RemoteCacheSessionsLoader.computeLoaderContext(KeycloakSession session)
>>    // ...
>>     int sessionsTotal = remoteCache.size(); <--- HERE
>>    //...
>> }
>>
>> And then (OOM here):
>>
>> InfinispanCacheInitializer.startLoadingImpl(InitializerState state,
>> SessionLoader.LoaderContext ctx) {
>>   // ...
>>    for (Future<WorkerResult> future : futures) {
>>      // Called 4X (ie the number of segments), but 1st one does not
>> terminate: OOM
>>      // ->
>> org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart
>>     WorkerResult result = future.get(); <--- Very slow and bump mem to 8GB
>>      // ...
>> }
>>
>> So, with 1M of sessions in store, I cannot get KC/ISPN to start. And I am
>> far from my goal which is to keep one year of sessions (which has been
>> estimated to ~52M of sessions)...
>>
>> Is it something I can't achieve with KC/ISPN?
>> Any help appreciated.
>>
>> Thanks
>> nick
>>
>> Note, versions I used are:
>>
>> * ISPN 9.4.1 Wildfly
>> * KC 4.6.0 Wildfly
>>
>> -- ISPN process
>>
>> 10:42:22,969 ERROR [stderr] (Periodic Recovery) Exception in thread
>> "Periodic Recovery" java.lang.OutOfMemoryError: Java heap space
>> 10:42:22,977 ERROR [stderr] (Periodic Recovery)         at
>> sun.text.resources.fr.FormatData_fr.getContents(FormatData_fr.java:86)
>> 10:42:22,975 ERROR
>>
>> [org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory]
>> (pool-9-thread-1) ISPN008018: Sql failure retrieving connection from
>> datasource: java.sql.SQLException: javax.resource.ResourceException:
>> IJ000456: Unchecked throwable in ManagedConnection.getConnection()
>>
>> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener at 3e3890ff
>> [state=NORMAL
>> managed
>>
>> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection at 3f033728
>> connection handles=0 lastReturned=1542706942971
>> lastValidated=1542706738387
>> lastCheckedOut=1542706916019 trackByTx=false
>> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool at 2772dcff
>> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool at 5e82b87d
>> [pool=InfinispanDS]
>> xaResource=LocalXAResourceImpl at 308d2c98[connectionListener=3e3890ff
>> connectionManager=1691f7d6 warned=false currentXid=null
>> productName=PostgreSQL productVersion=10.5
>> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>>         at
>>
>> org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146)
>>         at
>>
>> org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
>>         at
>>
>> org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory.getConnection(ManagedConnectionFactory.java:83)
>>         at
>>
>> org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.purge(JdbcStringBasedStore.java:461)
>>         at
>>
>> org.infinispan.persistence.manager.PersistenceManagerImpl.lambda$purgeExpired$6(PersistenceManagerImpl.java:459)
>>         at java.util.ArrayList.forEach(ArrayList.java:1257)
>>         at
>>
>> org.infinispan.persistence.manager.PersistenceManagerImpl.purgeExpired(PersistenceManagerImpl.java:462)
>>         at
>>
>> org.infinispan.expiration.impl.ClusterExpirationManager.processExpiration(ClusterExpirationManager.java:119)
>>         at
>>
>> org.infinispan.expiration.impl.ExpirationManagerImpl$ScheduledTask.run(ExpirationManagerImpl.java:245)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>         at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>         at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>         at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: javax.resource.ResourceException: IJ000456: Unchecked throwable
>> in ManagedConnection.getConnection()
>>
>> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener at 3e3890ff
>> [state=NORMAL
>> managed
>>
>> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection at 3f033728
>> connection handles=0 lastReturned=1542706942971
>> lastValidated=1542706738387
>> lastCheckedOut=1542706916019 trackByTx=false
>> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool at 2772dcff
>> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool at 5e82b87d
>> [pool=InfinispanDS]
>> xaResource=LocalXAResourceImpl at 308d2c98[connectionListener=3e3890ff
>> connectionManager=1691f7d6 warned=false currentXid=null
>> productName=PostgreSQL productVersion=10.5
>> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>>         at
>>
>> org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:811)
>>         at
>>
>> org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:138)
>>         ... 15 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>
>>
>> -- KC process
>>
>> 10:42:23,216 WARN  [org.infinispan.client.hotrod.impl.protocol.Codec21]
>> (Thread-0) ISPN004005: Error received from the server:
>> java.lang.RuntimeException: java.sql.SQLException: Error
>>
>> java.sql.SQLException: Error
>> java.lang.OutOfMemoryError: Java heap space
>> 10:42:23,359 WARN
>>
>> [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionsLoader]
>> (pool-16-thread-4) Error loading sessions from remote cache 'sessions' for
>> segment '3':
>> org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for
>> messageId=55 returned server error (status=0x85):
>> java.lang.RuntimeException: java.sql.SQLException: Error
>> java.sql.SQLException: Error
>> java.lang.OutOfMemoryError: Java heap space
>>         at
>>
>> org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:333)
>>         at
>>
>> org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:179)
>>         at
>>
>> org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:138)
>>         at
>>
>> org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:98)
>>         at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>>         at
>>
>> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>>         at
>>
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>         at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>         at
>>
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>>         at
>>
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
>>         at
>>
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:647)
>>         at
>>
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:582)
>>         at
>>
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
>>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:461)
>>         at
>>
>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> _______________________________________________
>> keycloak-user mailing list
>> keycloak-user at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>
>
>