[jboss-jira] [JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.

Bela Ban (JIRA) issues at jboss.org
Tue Dec 6 06:36:00 EST 2016


    [ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13334484#comment-13334484 ] 

Bela Ban commented on JGRP-2135:
--------------------------------

Hi Zoltan,

I did remove the ununsed assignment and the null check.

However, I don't see what's wrong with catching the OOME and falling out of the loop. If I didn't do that, or re-threw the exception, notifyConnectionClosed() would not be called...

I tried to reproduce this with IspnPerfTest but wasn't able to. Can you reproduce it? If so, it would be nice if you could capture the bytes read by the receiver thread that triggered the OOME.

In my experience, it is usually a different version of JGroups sending messages to this member. A random process can be excluded because the first 4 bytes after connection establishment not being {'b', 'e', 'l', 'a'} would lead to a close.

> OOM with JGroups 3.6.11.
> ------------------------
>
>                 Key: JGRP-2135
>                 URL: https://issues.jboss.org/browse/JGRP-2135
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.11
>            Reporter: Zoltan Farkas
>            Assignee: Bela Ban
>             Fix For: 3.6.12, 4.0
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p" 
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame                                                              |Name                                                                                             | Shallow Heap | Retained Heap |Context Class Loader                         |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838                                                     |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461|          120 |           456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)             |                                                                                                 |              |               |                                             |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)|                                                                                                 |              |               |                                             |
> |- at java.lang.Thread.run()V (Thread.java:745)                                   |                                                                                                 |              |               |                                             |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
>                 try {
>                     int len=in.readInt();
>                     if(buffer == null || buffer.length < len)
>                         buffer=new byte[len];
>                     in.readFully(buffer, 0, len);
>                     updateLastAccessed();
>                     server.receive(peer_addr, buffer, 0, len);
>                 }
>                 catch(OutOfMemoryError mem_ex) {
>                     t=mem_ex;
>                     break; // continue;
>                 }
>                 catch(IOException io_ex) {
>                     t=io_ex;
>                     break;
>                 }
>                 catch(Throwable e) {
>                 }
>             }
> {code}
> when allocating:   buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
>                 catch(OutOfMemoryError mem_ex) {
>                     t=mem_ex;
>                     break; // continue;
>                 }
> {code}
> Handling OutOfMemoryError is a strange implementation choice... 
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the jboss-jira mailing list