[
https://issues.jboss.org/browse/ISPN-9817?page=com.atlassian.jira.plugin....
]
Rakesh Vende commented on ISPN-9817:
------------------------------------
[~NadirX],
We cannot upgrade to 9.4 as of now as we don't know what new set of challenges it
brings in even if solves old issue.
I have studied this part of Infinispan source code and have identified the code fixes.
Have pulled in 7.2.* branch. Is there a way, we can review the fix and I can contribute to
this issue?
About Issue and Resolution -
I have given detailed analysis of issue in defect description. Will elaborate here in
brief. Please check below the comment for ReplicationQueue tnterface from Infinispan
Source code.
**
* Periodically (or+ when certain size is exceeded+) takes elements and replicates them.
*
* @author <a href="mailto:bela@jboss.org">Bela Ban</a>
* @author Mircea.Markus(a)jboss.com
* @since 4.0
*/
_when certain size is exceeded_ - This when size is exceeded, we dont have control on what
max size we are going to replicate. Also for asycn replication queue, this size does not
matters. It flushes the queue on its interval, irrespective of how much elements to be
drains from queue for replication.
OOM Error on ExposedByteArrayOutputStream
-----------------------------------------
Key: ISPN-9817
URL:
https://issues.jboss.org/browse/ISPN-9817
Project: Infinispan
Issue Type: Bug
Affects Versions: 7.2.4.Final
Reporter: Rakesh Vende
Priority: Critical
Fix For: 7.2.4.Final
Attachments: 11.jpg
Titile - OOM Error on ExposedByteArrayOutputStream
Data -
1. Replication Mode is Async
2. queue-size="500"
3. queue-flush-interval="10000"
Details -
1. Application threads frequently calling put method on replicated cache results in
calling flush method of ReplicationQueueImpl.java
2. This cause application threads to wait for every 500th put call to complete the cache
replication from the queue
3. This becomes kind of sync replication which blocks application threads.
4. To avoid this situation, we can increase the queue size large enough, which,
apparently, does not have any side effect as queue is linked blocking queue and
application threads will only get blocked when queue becomes full.
5. However this puts pressure on aysnc queue, which has to replicate entire queue at
once.
_replicationQueue-thread--p4-t1 tid=119 [RUNNABLE] [DAEMON] <--- OutOfMemoryError
happened in this thread
java.lang.OutOfMemoryError.<init>() OutOfMemoryError.java:48
org.infinispan.commons.io.ExposedByteArrayOutputStream.write(byte[], int, int)
ExposedByteArrayOutputStream.java:71
_
6. This out of memory happens when JVM fails to allocate continuations chunk of memory in
the form of array of 1 or 2 GB
Summary - If we set queue size to normal or low level, application threads result in
calling flush which turns out to be sync replication which blocks other application
threads. And, if I increase the queue size to maximum enough so as to avoid sync flush
then replication queue throws OOM
--
This message was sent by Atlassian Jira
(v7.12.1#712002)