[jboss-jira] [JBoss JIRA] (JGRP-2382) JGroups version 4.0.13.Final.jar is causing memory leaks

Fri Sep 20 11:07:00 EDT 2019

    [ https://issues.jboss.org/browse/JGRP-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787433#comment-13787433 ] 

Rashmi Acharya commented on JGRP-2382:
--------------------------------------

Hi Bella,

For now we made further changes on the property and saw Cluster Load distribution is working fine. 

Support is going to share the new changes to customer and wait until tomorrow to see whether it is solving the memory leak.

Here are the changes we made further:

1: Removing use_mcast_xmit="false" from NACACK2 protocol
2:Removed UNICAST3 sine heap memory growth was due to these messages.. This is actually not required for Workflow load distribution
3: Added FD_ALL parameter
4: Removed BARRIER

New Change:
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"> <TCP bind_port="&MULTICAST_NODE_PORT1;" /> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true" /> <MERGE3 min_interval="3000" max_interval="5000"/><FD_ALL timeout="20000" interval="15000" /> <FD_SOCK/> <FD timeout="5000" max_tries="48" /> <VERIFY_SUSPECT timeout="1500" /> <pbcast.NAKACK2 discard_delivered_msgs="true"/> <pbcast.STABLE desired_avg_gossip="20000" max_bytes="0" stability_delay="1000" /> <pbcast.GMS print_local_addr="true" join_timeout="15000"/></config>

We need to understand 
1: why we saw issues with load distribution when we had use_mcast_xmit in NACACK2.
2 What is the equivalent parameter for NACACK retransmit_timeout="5000" in NACACK2
3: What is the use of FD_ALL parameter
4: Why UNICAST3 objects were lying on memory and is there any way it can be GCed with timeout settings ?

> JGroups version 4.0.13.Final.jar is causing memory leaks
> --------------------------------------------------------
>
>                 Key: JGRP-2382
>                 URL: https://issues.jboss.org/browse/JGRP-2382
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 4.0.13
>         Environment: AIX machine 7.1 with JDK 1.8
>            Reporter: Rashmi Acharya
>            Assignee: Bela Ban
>            Priority: Major
>         Attachments: dumps_TEST_node1_20190918_after_3_hours.zip, dumps_TEST_node1_20190918_right_after_restart.zip, dumps_TEST_node2_20190918_after_3_hours.zip, dumps_TEST_node2_20190918_right_after_restart.zip
>
>
> We are observing a constant memory growth and leak with JGroup version 4.0.13 .. 
> One of our customer is having two node cluster environment and in one node we are observing org.Group.Messages which contain org.groups.Header and org.groups.Stack.ipAddress objects.. these are not getting cleared from memory..
> We dont see any exception related to Jgroups from logs and but it is causing a gradual emory growth and OOM.
> Here is the Jgroups cluster configuration we have:
> dynamic.cluster.property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups"   xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT2;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000" />
> <FD_ALL timeout="20000" interval="15000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48" />
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000"  max_bytes="0" stability_delay="1000"/>
> <pbcast.GMS print_local_addr="true" join_timeout="15000" />
> </config>
> =================================
> dynamic.cluster.distribution_property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_port="&MULTICAST_NODE_PORT1;" />
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000" max_bytes="0" stability_delay="1000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> </config>    
> ================================
> dynamic.cluster.lock.protocolStack    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT3;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3  min_interval="3000"  max_interval="5000"/>
> <FD_ALL timeout="20000" interval="5000"/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false"  discard_delivered_msgs="true"/>
> <UNICAST3 /> <pbcast.STABLE desired_avg_gossip="20000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> <FRAG2 frag_size="8096"/>
> <CENTRAL_LOCK2/>
> </config>    

--
This message was sent by Atlassian Jira
(v7.13.8#713008)