[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.

Wednesday, 29 April 2020

    [
https://issues.redhat.com/browse/JGRP-2135?page=com.atlassian.jira.plugin...
] 

Bela Ban commented on JGRP-2135:
--------------------------------

OK, I fixed this.

The root cause was *not* a spurious connection by a non-JGroups process, but the catching
of {{Throwable}} in {{TcpConnection.run()}}: when there was an exception, the connection
would not be closed, but we'd restart at the top of the loop, trying to read a new
message. (Note that this could for example happen when the peer thread was interrupted
trying to send a message, without closing its end of the connection).

However, when there was still stale data in the TCP pipe, we'd read the first 4 bytes
and interpret them as length.

The reason the root cause was most likely not another non-JGroups process is that a
non-JGroups process would have to send the correct cookie, version, and peer address at
connection establishment time. Doable by a malicious process, but highly unlikely during
regular processing.

The fix is that the loop in {{TcpConnection.run()}} now terminates on an exception, so
that the next time we're sending data to the peer (or receive data from the peer), a
new connection will need to be created.

This also implies we don't need a special {{max_size}} attribute.

...
 OOM with JGroups 3.6.11.
 ------------------------

                 Key: JGRP-2135
                 URL: https://issues.redhat.com/browse/JGRP-2135
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 3.6.11
            Reporter: Zoltan Farkas
            Assignee: Bela Ban
            Priority: Major
             Fix For: 3.6.12

 We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p" 
 we have been experiencing OOMs fairly often, and the OOMs happen at:
 {code}
 Object / Stack Frame                                                              |Name  

| Shallow Heap | Retained Heap |Context Class Loader                         |Is Daemon

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 java.lang.Thread @ 0x81bdf838                                                    
|Connection.Receiver [144.77.77.53:50363 -
144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461|          120 |           456
|sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
 |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)             |

     |              |               |                                             |
 |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)|      

|              |               |                                             |
 |- at java.lang.Thread.run()V (Thread.java:745)                                   |      

|              |               |                                             |

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {code}
 the Code where it happens is in TcpConnection.java:
 {code}
 while(canRun()) {
                 try {
                     int len=in.readInt();
                     if(buffer == null || buffer.length < len)
                         buffer=new byte[len];
                     in.readFully(buffer, 0, len);
                     updateLastAccessed();
                     server.receive(peer_addr, buffer, 0, len);
                 }
                 catch(OutOfMemoryError mem_ex) {
                     t=mem_ex;
                     break; // continue;
                 }
                 catch(IOException io_ex) {
                     t=io_ex;
                     break;
                 }
                 catch(Throwable e) {
                 }
             }
 {code}
 when allocating:   buffer=new byte[len];
 it looks to me that some invalid large value is received and the process OOMs when
allocating a huge byte array
 Running JVMs without kill on OOM would make this issue "dissapear" in the sense
that it is swallowed by:
 {code}
                 catch(OutOfMemoryError mem_ex) {
                     t=mem_ex;
                     break; // continue;
                 }
 {code}
 Handling OutOfMemoryError is a strange implementation choice... 
 instead a size limit should be employed to protect from receiving invalid sizes...
 My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is
huge... 

--
This message was sent by Atlassian Jira
(v7.13.8#713008)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006