[infinispan-issues] [JBoss JIRA] (ISPN-9817) OOM Error on ExposedByteArrayOutputStream

Rakesh Vende (Jira) issues at jboss.org
Fri Dec 14 01:08:00 EST 2018


    [ https://issues.jboss.org/browse/ISPN-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674616#comment-13674616 ] 

Rakesh Vende commented on ISPN-9817:
------------------------------------

[~NadirX], 

We cannot upgrade to 9.4 as of now as we don't know what new set of challenges it brings in even if solves old issue.
I have studied this part of Infinispan source code and have identified the code fixes. Have pulled in 7.2.* branch. Is there a way, we can review the fix and I can contribute to this issue? 

About Issue and Resolution - 

I have given detailed analysis of issue in defect description.  Will elaborate here in brief. Please check below the comment for ReplicationQueue tnterface from Infinispan Source code.

**
 * Periodically (or+ when certain size is exceeded+) takes elements and replicates them.
 *
 * @author <a href="mailto:bela at jboss.org">Bela Ban</a>
 * @author Mircea.Markus at jboss.com
 * @since 4.0
 */

_when certain size is exceeded_ - This when size is exceeded, we dont have control on what max size we are going to replicate. Also for asycn replication queue, this size does not matters. It flushes the queue on its interval, irrespective of how much elements to be drains from queue for replication. 


> OOM Error on ExposedByteArrayOutputStream
> -----------------------------------------
>
>                 Key: ISPN-9817
>                 URL: https://issues.jboss.org/browse/ISPN-9817
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 7.2.4.Final
>            Reporter: Rakesh Vende
>            Priority: Critical
>             Fix For: 7.2.4.Final
>
>         Attachments: 11.jpg
>
>
> Titile - OOM Error on ExposedByteArrayOutputStream 
> Data - 
> 1. Replication Mode is Async
> 2. queue-size="500" 
> 3. queue-flush-interval="10000"
> Details - 
> 1.  Application threads frequently calling put method on replicated cache results in calling flush method of ReplicationQueueImpl.java 
> 2. This cause application threads to wait for every 500th put call to complete the cache replication from the queue
> 3. This becomes kind of sync replication which blocks application threads.
> 4. To avoid this situation, we can increase the queue size large enough, which, apparently, does not have any side effect as queue is linked blocking queue and application threads will only get blocked when queue becomes full. 
> 5.  However this puts pressure on aysnc queue, which has to replicate entire queue at once. 
> _replicationQueue-thread--p4-t1 tid=119 [RUNNABLE] [DAEMON] <--- OutOfMemoryError happened in this thread
> java.lang.OutOfMemoryError.<init>() OutOfMemoryError.java:48
> org.infinispan.commons.io.ExposedByteArrayOutputStream.write(byte[], int, int) ExposedByteArrayOutputStream.java:71
> _
> 6. This out of memory happens when JVM fails to allocate continuations chunk of memory in the form of array of 1 or 2 GB
> Summary - If we set queue size to normal or low level,  application threads result in calling flush which turns out to be sync replication which blocks other application threads. And, if I increase the queue size to maximum enough so as to avoid sync flush then replication queue throws OOM



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the infinispan-issues mailing list