[jboss-jira] [JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
Zoltan Farkas (JIRA)
issues at jboss.org
Thu Dec 22 12:01:00 EST 2016
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13341674#comment-13341674 ]
Zoltan Farkas commented on JGRP-2135:
-------------------------------------
Hi Bela, I patched the lib like:
{code}
private static final int MAX_MESSAGE_SIZE = Integer.getInteger("jgroups.maxMessageSize", 1048576);
public void run() {
Throwable t=null;
while(canRun()) {
try {
int len=in.readInt();
if (len > MAX_MESSAGE_SIZE || len < 0) {
t = new IOException("Received message size invalid " + len);
in.close();
break;
}
if(buffer == null || buffer.length < len)
buffer=new byte[len];
in.readFully(buffer, 0, len);
updateLastAccessed();
server.receive(peer_addr, buffer, 0, len);
} catch(IOException io_ex) {
t=io_ex;
break;
}
}
server.notifyConnectionClosed(TcpConnection.this, String.format(Locale.US, "%s: %s", getClass().getSimpleName(),
t != null? t.toString() : "n/a"));
}
{code}
I am not sure where the invalid message sizes were coming, but this stopped my processes from crashing...
I find this a better solution than catching OOM...
Reliably recovering from OOM is impossible... which is why we run our processes with "kill on OOM"
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the jboss-jira
mailing list