[jboss-jira] [JBoss JIRA] (JGRP-2382) JGroups version 4.0.13.Final.jar is causing memory leaks

Fri Sep 20 01:41:00 EDT 2019

    [ https://issues.jboss.org/browse/JGRP-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787058#comment-13787058 ] 

Rashmi Acharya commented on JGRP-2382:
--------------------------------------

With Jgroup 3.4.0 version we never had UNICAST option and It used to work fine with both load balancing and No memory leaks.  If Unicast acknowledgement were processed by other node in the system. 

old Property:
distribution_property_string=TCP(bind_port=60961;thread_pool_rejection_policy=run):TCPPING(initial_hosts=harp[60961];port_range=0;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48;level=ERROR):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)

Now you are suggesting us to add back UNICAST3 ? What about UNICAST protocol ? 
FD timeout="5000" max_tries="48"/> is this need to be changed to FD_ALL ?
BARRIER we never had  in the system ..

Do you think adding Timeout with UNICAST3 makes any difference w.r.t clearing those objects in memory ? If Yes, we can introduce back UNICAST3 with timeout=5000

> JGroups version 4.0.13.Final.jar is causing memory leaks
> --------------------------------------------------------
>
>                 Key: JGRP-2382
>                 URL: https://issues.jboss.org/browse/JGRP-2382
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 4.0.13
>         Environment: AIX machine 7.1 with JDK 1.8
>            Reporter: Rashmi Acharya
>            Assignee: Bela Ban
>            Priority: Major
>         Attachments: dumps_TEST_node1_20190918_after_3_hours.zip, dumps_TEST_node1_20190918_right_after_restart.zip, dumps_TEST_node2_20190918_after_3_hours.zip, dumps_TEST_node2_20190918_right_after_restart.zip
>
>
> We are observing a constant memory growth and leak with JGroup version 4.0.13 .. 
> One of our customer is having two node cluster environment and in one node we are observing org.Group.Messages which contain org.groups.Header and org.groups.Stack.ipAddress objects.. these are not getting cleared from memory..
> We dont see any exception related to Jgroups from logs and but it is causing a gradual emory growth and OOM.
> Here is the Jgroups cluster configuration we have:
> dynamic.cluster.property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups"   xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT2;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000" />
> <FD_ALL timeout="20000" interval="15000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48" />
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000"  max_bytes="0" stability_delay="1000"/>
> <pbcast.GMS print_local_addr="true" join_timeout="15000" />
> </config>
> =================================
> dynamic.cluster.distribution_property_string    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_port="&MULTICAST_NODE_PORT1;" />
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3 min_interval="3000" max_interval="5000"/>
> <FD_SOCK/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/>
> <UNICAST3/>
> <pbcast.STABLE desired_avg_gossip="20000" max_bytes="0" stability_delay="1000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> </config>    
> ================================
> dynamic.cluster.lock.protocolStack    
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <TCP bind_addr="&HOST_ADDR;" bind_port="&MULTICAST_NODE_PORT3;"/>
> <TCPPING async_discovery="true" initial_hosts="&CLUSTER_INITIAL_HOSTS;" port_range="0" send_cache_on_join="true"/>
> <MERGE3  min_interval="3000"  max_interval="5000"/>
> <FD_ALL timeout="20000" interval="5000"/>
> <FD timeout="5000" max_tries="48"/>
> <VERIFY_SUSPECT timeout="1500"/>
> <BARRIER/>
> <pbcast.NAKACK2 use_mcast_xmit="false"  discard_delivered_msgs="true"/>
> <UNICAST3 /> <pbcast.STABLE desired_avg_gossip="20000" />
> <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
> <FRAG2 frag_size="8096"/>
> <CENTRAL_LOCK2/>
> </config>    

--
This message was sent by Atlassian Jira
(v7.13.5#713005)