[jboss-jira] [JBoss JIRA] (JGRP-2382) JGroups version 4.0.13.Final.jar is causing memory leaks

Bela Ban (Jira) issues at jboss.org
Thu Sep 19 10:56:00 EDT 2019


    [ https://issues.jboss.org/browse/JGRP-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786854#comment-13786854 ] 

Bela Ban commented on JGRP-2382:
--------------------------------

The heap shows that UNICAST3 is retaining messages _sent_ to another member. For some reason, that member doesn't ack any of the sent messages, so they are kept around until an ACK has been received, or that member dies, or leaves the cluster.

Apparently, that member is _not_ suspected (and removed), so the table for messages sent to it cannot be cleared.

To further diagnose this issue, logs would be useful, at TRACE level for UNICAST3 and GMS.

If they have a running system, with the issue present, they could also use probe.sh to look into the system.

> JGroups version 4.0.13.Final.jar is causing memory leaks
> --------------------------------------------------------
>
>                 Key: JGRP-2382
>                 URL: https://issues.jboss.org/browse/JGRP-2382
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 4.0.13
>         Environment: AIX machine 7.1 with JDK 1.8
>            Reporter: Rashmi Acharya
>            Assignee: Bela Ban
>            Priority: Major
>         Attachments: dumps_TEST_node1_20190918_after_3_hours.zip, dumps_TEST_node1_20190918_right_after_restart.zip, dumps_TEST_node2_20190918_after_3_hours.zip, dumps_TEST_node2_20190918_right_after_restart.zip
>
>
> We are observing a constant memory growth and leak with JGroup version 4.0.13 .. 
> One of our customer is having two node cluster environment and in one node we are observing org.Group.Messages which contain org.groups.Header and org.groups.Stack.ipAddress objects.. these are not getting cleared from memory..
> We dont see any exception related to Jgroups from logs and but it is causing a gradual emory growth and OOM.
> Here is the Jgroups cluster configuration we have:
> dynamic.cluster.property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups"   xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT2;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000" />
> <FD_ALL timeout="20000" interval="15000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48" />
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000"  max_bytes="0" stability_delay="1000"/>
> <pbcast.GMS print_local_addr="true" join_timeout="15000" />
> </config>
> =================================
> dynamic.cluster.distribution_property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_port="&MULTICAST_NODE_PORT1;" />
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000" max_bytes="0" stability_delay="1000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> </config>    
> ================================
> dynamic.cluster.lock.protocolStack    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT3;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3  min_interval="3000"  max_interval="5000"/>
> <FD_ALL timeout="20000" interval="5000"/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false"  discard_delivered_msgs="true"/>
> <UNICAST3 /> <pbcast.STABLE desired_avg_gossip="20000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> <FRAG2 frag_size="8096"/>
> <CENTRAL_LOCK2/>
> </config>    



--
This message was sent by Atlassian Jira
(v7.13.5#713005)


More information about the jboss-jira mailing list