----- Original Message -----
From: "Nicolas Ocquidant" <nocquidant(a)gmail.com>
To: mposolda(a)redhat.com
Cc: slaskawi(a)redhat.com, wburns(a)redhat.com, keycloak-user(a)lists.jboss.org
Sent: Tuesday, November 20, 2018 8:41:58 AM
Subject: Re: [keycloak-user] OOM at startup with 1 million sessions
Actually, the OOM does not come right after remoteCache#size, and maybe it
consumes up to 3GB because I have enough heap space (my laptop has 16GB
RAM)...
HeapDumpOnOutOfMemory from the ISPN process shows that the heap is full of
byte[3][] objects (4 threads):
https://gist.github.com/nocquidant/4e84eb8d704b56221a9865cb63951734
Okay, I did some digging and it looks like the fetch size is ignored when running the
select against postgres. The cause is postgres doesn't do paging unless auto-commit is
disabled on the connection [1]. This is causing all the entries to be in memory at once.
This will be fixed in [2].
Also a side note for keycloak, I would recommend adding some backpressure support at [3] -
if the puts are too slow you could end up with a large amount of entries in memory at the
same time.
[1]
https://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
[2]
https://issues.jboss.org/browse/ISPN-9752
[3]
https://github.com/keycloak/keycloak/blob/4.6.0.Final/model/infinispan/sr...
According to me, OOM occurs in the following method (KC side), when getting
from the future:
protected void startLoadingImpl(InitializerState state,
SessionLoader.LoaderContext ctx) {
// Assume each worker has same processor's count
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService localExecutor = Executors.newCachedThreadPool();
Transport transport = workCache.getCacheManager().getTransport();
boolean distributed = transport != null;
ExecutorService executorService = distributed ? new
DefaultExecutorService(workCache, localExecutor) : localExecutor;
int errors = 0;
try {
while (!state.isFinished()) {
int nodesCount = transport==null ? 1 :
transport.getMembers().size();
int distributedWorkersCount = processors * nodesCount;
log.debugf("Starting next iteration with %d workers",
distributedWorkersCount);
List<Integer> segments =
state.getUnfinishedSegments(distributedWorkersCount);
if (log.isTraceEnabled()) {
log.trace("unfinished segments for this iteration: " +
segments);
}
List<Future<WorkerResult>> futures = new
LinkedList<>();
for (Integer segment : segments) {
SessionInitializerWorker worker = new
SessionInitializerWorker();
worker.setWorkerEnvironment(segment, ctx,
sessionLoader);
if (!distributed) {
worker.setEnvironment(workCache, null);
}
Future<WorkerResult> future =
executorService.submit(worker);
futures.add(future);
}
for (Future<WorkerResult> future : futures) {
try {
WorkerResult result = future.get(); <--- HERE
if (result.getSuccess()) {
int computedSegment = result.getSegment();
state.markSegmentFinished(computedSegment);
} else {
if (log.isTraceEnabled()) {
log.tracef("Segment %d failed to compute",
result.getSegment());
}
}
} catch (InterruptedException ie) {
errors++;
log.error("Interruped exception when computed
future. Errors: " + errors, ie);
} catch (ExecutionException ee) {
errors++;
log.error("ExecutionException when computed future.
Errors: " + errors, ee);
}
}
if (errors >= maxErrors) {
throw new RuntimeException("Maximum count of worker
errors occured. Limit was " + maxErrors + ". See server.log for
details");
}
saveStateToCache(state);
log.debugf("New initializer state pushed. The state is:
%s", state);
}
// Loader callback after the task is finished
this.sessionLoader.afterAllSessionsLoaded(this);
} finally {
if (distributed) {
executorService.shutdown();
}
localExecutor.shutdown();
}
}
Le mar. 20 nov. 2018 à 14:11, Marek Posolda <mposolda(a)redhat.com> a écrit :
> On 20/11/2018 13:42, Sebastian Laskawiec wrote:
>
> If I'm not mistaken (+William Burns <wburns(a)redhat.com> is an expert
> here), invoking RemoteCache#size() method requires to pull all entries from
> the store into the memory. If so, that's why it blows up.
To confirm Infinispan iteration only pulls a batch at a time. In 9.4 we are using rxjava2
Publisher which batches 128 entries at a time. Unfortunately, the fetch size on the JDBC
connection alone was not enough to have the JDBC driver batch though :P
>
> The only way that comes into my mind is to get rid of RemoteCache#size()
> calls from Keycloak (+Marek Posolda <mposolda(a)redhat.com> WDYT, is it
> doable?).
>
> Hmm... I think that remoteCache#size doesn't need to load the entries to
> the memory? I see that HotRod protocol has "size" operation and that one
> seem to be called under the covers when you call RemoteCache#size (on
> client side, the Java class is
> org.infinispan.client.hotrod.impl.operations.SizeOperation ).
>
> I am not sure how much memory is needed to have 1 million of user
> sessions. We plan to have some tests for this, but didn't test yet this
> with the cross-dc enabled. Do you have both JDG and Keycloak on same
> machine? Maybe it will help to increase or decrease the value of "Xmx"
for
> both JDG and Keycloak. AFAIK if it's too big, it means that garbage
> collectors are not called very often and new records are added to the
> memory and JDK process doesn't have enough space to handle it. But not 100%
> sure if that can help.
>
> Marek
>
>
> I'm just wondering if adding more memory to the ISPN process is an option
> in your case?
>
> On Tue, Nov 20, 2018 at 11:12 AM Nicolas Ocquidant <nocquidant(a)gmail.com>
> wrote:
>
>> Hi
>>
>> My UC is to keep one year of sessions, this is what I have done to
>> simulate
>> it:
>>
>> 1. I use JDBC to store 1 million of session objects, 2.8KB (in memory)
>> each
>> 2. I start one Infinispan node with passivation=false and shared=true, and
>> Xmx=8G
>> 3. I start one Keycloak node configured with a remote-cache and Xmx=4G
>>
>> Note that I use a remote-store in KC as it is the only way to set
>> passivation=false and shared=true (see
>>
http://lists.jboss.org/pipermail/keycloak-user/2018-November/016180.html
>> ).
>>
>> No problem in step 2, ISPN process is less than 300MB large in memory. But
>> after 3, ISPN process goes up to around 6GB.
>>
>> See below for the traces, but basically I get OOM.
>>
>> Using the debugger, I can see that getting the size of the cache is a
>> really slow operation, and bump memory to 3GB in ISPN process.
>>
>> From
>>
>> org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer:
>>
>> RemoteCacheSessionsLoader.computeLoaderContext(KeycloakSession session)
>> // ...
>> int sessionsTotal = remoteCache.size(); <--- HERE
>> //...
>> }
>>
>> And then (OOM here):
>>
>> InfinispanCacheInitializer.startLoadingImpl(InitializerState state,
>> SessionLoader.LoaderContext ctx) {
>> // ...
>> for (Future<WorkerResult> future : futures) {
>> // Called 4X (ie the number of segments), but 1st one does not
>> terminate: OOM
>> // ->
>> org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart
>> WorkerResult result = future.get(); <--- Very slow and bump mem to 8GB
>> // ...
>> }
>>
>> So, with 1M of sessions in store, I cannot get KC/ISPN to start. And I am
>> far from my goal which is to keep one year of sessions (which has been
>> estimated to ~52M of sessions)...
>>
>> Is it something I can't achieve with KC/ISPN?
>> Any help appreciated.
>>
>> Thanks
>> nick
>>
>> Note, versions I used are:
>>
>> * ISPN 9.4.1 Wildfly
>> * KC 4.6.0 Wildfly
>>
>> -- ISPN process
>>
>> 10:42:22,969 ERROR [stderr] (Periodic Recovery) Exception in thread
>> "Periodic Recovery" java.lang.OutOfMemoryError: Java heap space
>> 10:42:22,977 ERROR [stderr] (Periodic Recovery) at
>> sun.text.resources.fr.FormatData_fr.getContents(FormatData_fr.java:86)
>> 10:42:22,975 ERROR
>>
>> [org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory]
>> (pool-9-thread-1) ISPN008018: Sql failure retrieving connection from
>> datasource: java.sql.SQLException: javax.resource.ResourceException:
>> IJ000456: Unchecked throwable in ManagedConnection.getConnection()
>>
>> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3e3890ff
>> [state=NORMAL
>> managed
>>
>> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3f033728
>> connection handles=0 lastReturned=1542706942971
>> lastValidated=1542706738387
>> lastCheckedOut=1542706916019 trackByTx=false
>> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@2772dcff
>> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@5e82b87d
>> [pool=InfinispanDS]
>> xaResource=LocalXAResourceImpl@308d2c98[connectionListener=3e3890ff
>> connectionManager=1691f7d6 warned=false currentXid=null
>> productName=PostgreSQL productVersion=10.5
>> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>> at
>>
>>
org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146)
>> at
>>
>>
org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
>> at
>>
>>
org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory.getConnection(ManagedConnectionFactory.java:83)
>> at
>>
>>
org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.purge(JdbcStringBasedStore.java:461)
>> at
>>
>>
org.infinispan.persistence.manager.PersistenceManagerImpl.lambda$purgeExpired$6(PersistenceManagerImpl.java:459)
>> at java.util.ArrayList.forEach(ArrayList.java:1257)
>> at
>>
>>
org.infinispan.persistence.manager.PersistenceManagerImpl.purgeExpired(PersistenceManagerImpl.java:462)
>> at
>>
>>
org.infinispan.expiration.impl.ClusterExpirationManager.processExpiration(ClusterExpirationManager.java:119)
>> at
>>
>>
org.infinispan.expiration.impl.ExpirationManagerImpl$ScheduledTask.run(ExpirationManagerImpl.java:245)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>>
>>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>>
>>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: javax.resource.ResourceException: IJ000456: Unchecked throwable
>> in ManagedConnection.getConnection()
>>
>> cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3e3890ff
>> [state=NORMAL
>> managed
>>
>> connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3f033728
>> connection handles=0 lastReturned=1542706942971
>> lastValidated=1542706738387
>> lastCheckedOut=1542706916019 trackByTx=false
>> pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@2772dcff
>> mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@5e82b87d
>> [pool=InfinispanDS]
>> xaResource=LocalXAResourceImpl@308d2c98[connectionListener=3e3890ff
>> connectionManager=1691f7d6 warned=false currentXid=null
>> productName=PostgreSQL productVersion=10.5
>> jndiName=java:jboss/datasources/InfinispanDS] txSync=null]
>> at
>>
>>
org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:811)
>> at
>>
>>
org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:138)
>> ... 15 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>
>>
>> -- KC process
>>
>> 10:42:23,216 WARN [org.infinispan.client.hotrod.impl.protocol.Codec21]
>> (Thread-0) ISPN004005: Error received from the server:
>> java.lang.RuntimeException: java.sql.SQLException: Error
>>
>> java.sql.SQLException: Error
>> java.lang.OutOfMemoryError: Java heap space
>> 10:42:23,359 WARN
>>
>> [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionsLoader]
>> (pool-16-thread-4) Error loading sessions from remote cache 'sessions'
for
>> segment '3':
>> org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for
>> messageId=55 returned server error (status=0x85):
>> java.lang.RuntimeException: java.sql.SQLException: Error
>> java.sql.SQLException: Error
>> java.lang.OutOfMemoryError: Java heap space
>> at
>>
>>
org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:333)
>> at
>>
>> org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:179)
>> at
>>
>>
org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:138)
>> at
>>
>>
org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:98)
>> at
>>
>>
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>> at
>>
>>
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>> at
>>
>>
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>> at
>>
>>
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>> at
>>
>>
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>> at
>>
>>
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
>> at
>>
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:647)
>> at
>>
>>
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:582)
>> at
>>
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:461)
>> at
>>
>>
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> _______________________________________________
>> keycloak-user mailing list
>> keycloak-user(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/keycloak-user
>>
>
>