January 2010 - jboss-jira - Jboss List Archives

[JBoss JIRA] Created: (JGRP-1053) UNICAST: set retransmission timeout based on actual retransmission times

by Bela Ban (JIRA)

UNICAST: set retransmission timeout based on actual retransmission times ------------------------------------------------------------------------ Key: JGRP-1053 URL: https://jira.jboss.org/jira/browse/JGRP-1053 Project: JGroups Issue Type: Feature Request Reporter: Bela Ban Assignee: Bela Ban Fix For: 2.9 UNICAST needs to compute a rolling average of retransmission times, per sender (AckSenderWindow). The retransmission timeout per sender can then be set based on the actual average retransmission times. The advantage is that we throttle retransmission when we have a lot of message loss, and speed it up again when there are no message drops. The function to set the timeout should always compute the new timeot value based on (1) the old value times a decay factor and (2) a new value. The average should go up relatively quickly if the actual retransmission values go up, but come down slowly when the actual values go down. A potential function is shown below: static final double SLOW_DECAY_FACTOR=0.9, FAST_DECAY_FACTOR=0.7; static final double FAST_UP= 1 / FAST_DECAY_FACTOR, SLOW_UP= 1 / SLOW_DECAY_FACTOR; static final double SAFETY_BUFFER=0.3; static double avg=200; public static void main(String[] args) { final long[] times={200,200,400,400,500,500,500,500,500,100,100,100,100,100,100,100,100,100,100,100,100,100}; // final long[] times={200,200,200,200,200,200,200,200,200,200,200}; for(Long val: times) { double result=avg(val); System.out.println(val + ": " + result); } } private static double avg(long val) { double decay, up; if(val > avg) { decay=FAST_DECAY_FACTOR; up=FAST_UP; } else { decay=SLOW_DECAY_FACTOR; up=SLOW_UP; } double old_val=avg * decay; double result=(old_val + val * up) / 2; avg=result; return result * (1 + SAFETY_BUFFER); } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 6 months

2
6
0 / 0

[JBoss JIRA] Created: (JGRP-1054) NAKACK: retransmission timeouts based on actual retransmission values

by Bela Ban (JIRA)

NAKACK: retransmission timeouts based on actual retransmission values --------------------------------------------------------------------- Key: JGRP-1054 URL: https://jira.jboss.org/jira/browse/JGRP-1054 Project: JGroups Issue Type: Feature Request Reporter: Bela Ban Assignee: Bela Ban Fix For: 2.9 Currently, timeouts in NAKACK are static, e.g. 300,600,1200. The goal is to make the retransmission timeouts a function of the actual average retransmission times. Ideally, we maintain rolling averages of actual retransmission times per sender and adjust the retransmit timeout based on these values. This would dynamically slow down retransmission on heavy traffic/congetion/message loss etc, and speed it up when those problems disappear again -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 6 months

2
6
0 / 0

[JBoss JIRA] Created: (JGRP-592) Pluggable policy for picking coordinator

by Bela Ban (JIRA)

Pluggable policy for picking coordinator ---------------------------------------- Key: JGRP-592 URL: http://jira.jboss.com/jira/browse/JGRP-592 Project: JGroups Issue Type: Feature Request Reporter: Bela Ban Assigned To: Bela Ban Priority: Minor Fix For: 2.6 Make policy to pick coordinator pluggable. Pick new coordinator after existing coordinator crashes by calling this policy. Default policy is to pick the next in line. Other policies could pick the next in line from a set of pinned coordinator, so certain members can never become coordinators. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 6 months

4
7
0 / 0

[JBoss JIRA] Created: (JBJCA-262) new validation warning - class are packaged inside RAR instead of JAR

by Jeff Zhang (JIRA)

new validation warning - class are packaged inside RAR instead of JAR --------------------------------------------------------------------- Key: JBJCA-262 URL: https://jira.jboss.org/jira/browse/JBJCA-262 Project: JBoss JCA Issue Type: Sub-task Components: Deployer Reporter: Jeff Zhang Assignee: Jeff Zhang -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

2
4
0 / 0

[JBoss JIRA] Created: (JGRP-495) Memory leak in NAKACK

by Bela Ban (JIRA)

Memory leak in NAKACK --------------------- Key: JGRP-495 URL: http://jira.jboss.com/jira/browse/JGRP-495 Project: JGroups Issue Type: Bug Reporter: Bela Ban Assigned To: Bela Ban Priority: Blocker Fix For: 2.5 Probably due to the changes introduced in NAKACK/NakReceiverWindow (http://jira.jboss.com/jira/browse/JGRP-281). To reproduce, run perf.Test on 2 nodes with 1 million 1K messages each. JProfiler shows that we are leaking Map.Entry instances in NAKACK. They seem to accumulate in xmit_table. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

2
3
0 / 0

[JBoss JIRA] Created: (JBRULES-2301) Get rid of System.out.println statement

by Juraj Tomasov (JIRA)

Get rid of System.out.println statement --------------------------------------- Key: JBRULES-2301 URL: https://jira.jboss.org/jira/browse/JBRULES-2301 Project: Drools Issue Type: Quality Risk Security Level: Public (Everyone can see) Components: All Affects Versions: 5.1.0.M1 Reporter: Juraj Tomasov Assignee: Mark Proctor Priority: Minor System.out.println statement is currently used in 280 project files. I propose to replace them with logger. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

9
18
0 / 0

[JBoss JIRA] Created: (JGRP-364) When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group

by Matthew Todd (JIRA)

When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group ------------------------------------------------------------------------------------------------- Key: JGRP-364 URL: http://jira.jboss.com/jira/browse/JGRP-364 Project: JGroups Issue Type: Bug Affects Versions: 2.4 Environment: linux 2.6 kernel x86_64 running java 1.5.0_06 Reporter: Matthew Todd Assigned To: Bela Ban I am testing a jgroups tcp_nio configuration using the draw demo.If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message: org.jgroups.protocols.pbcast.ClientGmsImpl join WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying I am starting the draw demo like this; java -cp jgroups-all.jar:commons-logging.jar:concurrent.jar:jmxri.jar org.jgroups.demos.Draw -props test.xml Here is the configuration for one of my nodes: <config> <TCP_NIO bind_addr="192.158.70.200" recv_buf_size="20000000" send_buf_size="640000" loopback="false" discard_incompatible_packets="true" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="true" use_outgoing_packet_handler="true" down_thread="false" up_thread="false" enable_bundling="true" start_port="7800" end_port="7800" use_send_queues="false" sock_conn_timeout="300" skip_suspected_members="true" /> <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8" bind_addr="192.158.70.200" down_thread="false" up_thread="false"/> <MERGE2 max_interval="100000" down_thread="false" up_thread="false" min_interval="20000"/> <FD_SOCK down_thread="false" up_thread="false"/> <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/> <pbcast.NAKACK max_xmit_size="60000" use_mcast_xmit="false" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800" down_thread="true" up_thread="true" discard_delivered_msgs="true"/> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" down_thread="false" up_thread="false" max_bytes="400000"/> <pbcast.GMS print_local_addr="true" join_timeout="3000" down_thread="true" up_thread="true" join_retry_timeout="2000" shun="true" view_bundling="true"/>  <pbcast.STATE_TRANSFER/>  </config> Node 2 and 3 have the same configuration except the port they bind to has been changed -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

5
16
0 / 0

[JBoss JIRA] Created: (JGRP-356) TCP_NIO: failure starting correctly with bundling enabled

by Bela Ban (JIRA)

TCP_NIO: failure starting correctly with bundling enabled --------------------------------------------------------- Key: JGRP-356 URL: http://jira.jboss.com/jira/browse/JGRP-356 Project: JGroups Issue Type: Bug Affects Versions: 2.4 Reporter: Bela Ban Assigned To: Vladimir Blagojevic Fix For: 2.5 Stack is (default tcp-nio.xml with bundling enabled), the error is: $ jg Draw -props ./tcp-nio.xml ------------------------------------------------------- GMS: address is 127.0.0.1:7800 ------------------------------------------------------- ** View=[127.0.0.1:7800|0] [127.0.0.1:7800] 0 [WARN] [TimeScheduler.Thread] TimeScheduler._run(): task org.jgroups.protocols.TP$Bundler$BundlingTimer@1c65216 took 20921ms to execute, please check why it is taking so long. It is delaying other tasks <config> <TCP_NIO recv_buf_size="20000000" send_buf_size="640000" loopback="false" discard_incompatible_packets="true" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="true" use_outgoing_packet_handler="false" down_thread="false" up_thread="false" enable_bundling="true" start_port="7800" use_send_queues="false" sock_conn_timeout="300" skip_suspected_members="true" reader_threads="8" writer_threads="8" processor_threads="8" processor_minThreads="8" processor_maxThreads="8" processor_queueSize="100" processor_keepAliveTime="-1"/> <TCPPING timeout="3000" down_thread="false" up_thread="false" initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}" port_range="1" num_initial_members="3"/> <MERGE2 max_interval="100000" down_thread="false" up_thread="false" min_interval="20000"/> <FD_SOCK down_thread="false" up_thread="false"/> <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/> <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/> <pbcast.NAKACK max_xmit_size="60000" use_mcast_xmit="false" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800" down_thread="false" up_thread="false" discard_delivered_msgs="true"/> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" down_thread="false" up_thread="false" max_bytes="400000"/> <pbcast.GMS print_local_addr="true" join_timeout="3000" down_thread="false" up_thread="false" join_retry_timeout="2000" shun="true" view_bundling="true"/> <FC max_credits="2000000" down_thread="false" up_thread="false" min_threshold="0.10"/> <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> <pbcast.STREAMING_STATE_TRANSFER down_thread="false" up_thread="false" use_flush="true" use_reading_thread="true"/>  <pbcast.FLUSH down_thread="false" up_thread="false"/> </config> -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

3
5
0 / 0

[JBoss JIRA] Created: (JGRP-346) Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources.

by Stuart Jensen (JIRA)

Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources. ------------------------------------------------------------------------------------------------------------------------------------------- Key: JGRP-346 URL: http://jira.jboss.com/jira/browse/JGRP-346 Project: JGroups Issue Type: Bug Affects Versions: 2.3 SP1 Environment: SUSE Linux 9 Reporter: Stuart Jensen Assigned To: Bela Ban To duplicate the issue: 1) Create a four member cluster using the following configuration: (two member clusters exhibit the problem as well, just not as exaggerated) TCP(start_port=7801): TCPPING(initial_hosts=<ip addresses go here>;port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true): MERGE2(min_interval=5000;max_interval=10000): FD(shun=true;timeout=2500;max_tries=5;up_thread=true;down_thread=true): VERIFY_SUSPECT(timeout=2000;down_thread=false;up_thread=false): pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000): pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false): pbcast.GMS(join_timeout=5000;join_retry_timeout=3500;shun=true;print_local_addr=true;down_thread=true;up_thread=true) 2) I was running JGroups in a Tomcat servlet application. Start up the cluster. To determine the number of threads on Linux I executed the following commands: ps -ef | grep tomcat echo "" > catalina.out kill -QUIT <pid from ps command above> grep ".Sender \[" catalina.out | wc -l You get the process id of Tomcat using the ps command. Then clear the content of the catalina.out file. The kill command causes the threads to be printed into the catalina.out file. Then the grep searches for and counts all of the "ConnectionTable.Connnection.Sender" threads that are currently active on the system. 3) Pick one of the cluster member boxes and pull the network cable out of the box such that all communication with the other three members is terminated. 4) After one or two minutes, replace the network cable. 5) Repeat the steps to determine the number of threads currently active on the system. 6) Repeat steps 3 through 5, each time watching the number of threads. Each iteration will cause more and more threads to be orphaned on the system. It seems to grow exponentially, after about 4 iterations we have around 300-400 Sender threads. The Receiver threads will be orphaned also in similar numbers. After investigating the issue, I came up with the following "fix" which cleared the problem up. In the file ConnectionTable.java there is a method called retainAll(). It appears that this method is called by the TCP protocol when a view change occurs. This method removes Connnections from the "Connection Pool" (member variable conns) but does not destroy them. We initially thought the reaper thread may clean them up, but since the Connection objects are actually removed from the Connection Pool, the reaper does not help the situation. As we watched our connections we noticed that the Connections orphaned by this routine were the ones filling up the system's set of threads. So, we added code to call destroy() on all of the Connection objects that retainAll() removes from the Connection Pool. The "diff" is provided below. Note that we did our change in the JGroups 2.3 SP1 file ConnectionTable.java. Scott Marlow did this diff for me, the same change, but applied to the BasicConnectionTable from the 2.4 source set. Index: BasicConnectionTable.java =================================================================== RCS file: /cvsroot/javagroups/JGroups/src/org/jgroups/blocks/BasicConnectionTable.java,v retrieving revision 1.8 diff -r1.8 BasicConnectionTable.java 22,26c22 < import java.util.Map; < import java.util.Iterator; < import java.util.HashMap; < import java.util.Vector; < import java.util.Collection; --- > import java.util.*; 263c259,289 < conns.keySet().retainAll(c); --- > // conns.keySet().retainAll(c); > ArrayList alConnsToDestroy = new ArrayList(); > synchronized(conns) > { > HashMap copy=new HashMap(conns); > conns.keySet().retainAll(c); > Set ks = copy.keySet(); > Iterator iter = ks.iterator(); > while (iter.hasNext()) > { > Object oKey = iter.next(); > if (null == conns.get(oKey)) > { // This connection NOT in the resultant connection set > Connection conn = (Connection)copy.get(oKey); > if (null != conn) > { // Destroy this connection > alConnsToDestroy.add(conn); > } > } > } > } > // All of the connections that were not retained must be destroyed > // so that their resources are cleaned up. > for (int a=0; a<alConnsToDestroy.size(); a++) > { > Connection conn = (Connection)alConnsToDestroy.get(a); > if(log.isTraceEnabled()) > log.trace("Destroy this orphaned connection: " + conn); > conn.destroy(); > } > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

4
4
0 / 0

[JBoss JIRA] Created: (JBCLUSTER-186) Implementations of Invoker should implement equals as an equality check rather than relying on Object.equals, this is important for cluster fail-over support

by Scott Marlow (JIRA)

Implementations of Invoker should implement equals as an equality check rather than relying on Object.equals, this is important for cluster fail-over support ------------------------------------------------------------------------------------------------------------------------------------------------------------- Key: JBCLUSTER-186 URL: http://jira.jboss.com/jira/browse/JBCLUSTER-186 Project: JBoss Clustering Issue Type: Bug Security Level: Public (Everyone can see) Environment: JBoss as 4.0.4, although this seems to happen in 4.5 + 5.x as well. Reporter: Scott Marlow Assigned To: Scott Marlow Priority: Minor Part of how JRMPInvokerProxyHA handles fail-over includes removing the reference to the node that left the cluster. However, the dead node is not removed as an equality check is not implemented by certain Invoker implementations. The relevant code in JRMPInvokerProxyHA is protected void removeDeadTarget(Object target) { if (this.familyClusterInfo != null) this.familyClusterInfo.removeDeadTarget (target); } The code in familyClusterInfo is: public ArrayList removeDeadTarget(Object target) { synchronized (this) { ArrayList tmp = (ArrayList) targets.clone(); tmp.remove (target); this.targets = tmp; this.isViewMembersInSyncWithViewId = false; } return this.targets; } Since, we didn't include an equals test in many of the different Invoker implementations, the above "tmp.remove(target)" operation fails. The reason for the failure is due to the "targets" ArrayList changing on every invocation (to reflect the current cluster server membership list), a new "targets" is created (so of course comparing references later will not work.) A similar problem occurs with the EJB2 load balancers after a cluster membership changes. I think that these issues will be solved by implementing an equals test in the different invokers that can handle equality testing. PooledInvokerProxy should implement equals based on ServerAddress. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years, 7 months

1
2
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

jboss-jira January 2010