[JBoss JIRA] Created: (JBRULES-1009) Can't execute rules twice with a second classloader
by aaron dixon (JIRA)
Can't execute rules twice with a second classloader
---------------------------------------------------
Key: JBRULES-1009
URL: http://jira.jboss.com/jira/browse/JBRULES-1009
Project: JBoss Rules
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Reteoo
Affects Versions: 4.0.0.MR3
Reporter: aaron dixon
Assigned To: Mark Proctor
Priority: Minor
Attachments: drools-test.zip
I am trying to sequentially perform two rules executions using two
different classloaders in the same VM. I create completely new state
for each execution, yet I still fail on the second execution (the first execution succeeds). I
believe this has to do with some internal static cached state that
JBossRules is maintaining. An integration test is attached as an archived Eclipse project (zip).
Exception thrown:
Exception in thread "main" java.lang.ClassCastException: test.drools.classloader.FooShadowProxy
at org.drools.base.test.drools.classloader.Foo$getName.getValue(Unknown Source)
at org.drools.base.ClassFieldExtractor.getValue(ClassFieldExtractor.java:94)
at org.drools.base.evaluators.StringFactory$StringEqualEvaluator.evaluate(StringFactory.java:85)
at org.drools.rule.LiteralRestriction.isAllowed(LiteralRestriction.java:61)
at org.drools.rule.LiteralConstraint.isAllowed(LiteralConstraint.java:82)
at org.drools.reteoo.AlphaNode.assertObject(AlphaNode.java:121)
at org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:20)
at org.drools.reteoo.ObjectTypeNode.assertObject(ObjectTypeNode.java:159)
at org.drools.reteoo.Rete.assertObject(Rete.java:175)
at org.drools.reteoo.ReteooRuleBase.assertObject(ReteooRuleBase.java:190)
at org.drools.reteoo.ReteooWorkingMemory.doInsert(ReteooWorkingMemory.java:70)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:772)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:584)
at org.drools.reteoo.ReteooStatelessSession.execute(ReteooStatelessSession.java:63)
at test.drools.classloader.Driver.testRules(Driver.java:49)
at test.drools.classloader.Driver.go(Driver.java:35)
at test.drools.classloader.Driver.main(Driver.java:19)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 2 months
[JBoss JIRA] Created: (JBRULES-1471) Accumulate should not call getResult() more then necessary
by Geoffrey De Smet (JIRA)
Accumulate should not call getResult() more then necessary
----------------------------------------------------------
Key: JBRULES-1471
URL: http://jira.jboss.com/jira/browse/JBRULES-1471
Project: JBoss Drools
Issue Type: Task
Security Level: Public (Everyone can see)
Components: Reteoo
Reporter: Geoffrey De Smet
Assigned To: Edson Tirelli
Fix For: 5.0.0-M1
For this rule:
rule "roomCapacityTooSmall" // TODO it takes 50-90% of the performance of drools-solver
when
$period : Period();
$room : Room($capacity : capacity);
$totalStudentSize : Number(intValue > $capacity) from accumulate(
Exam(period == $period, room == $room, $studentSize : topicStudentSize),
sum($studentSize)
);
...
end
Outside drl, I update 2 out of 700 Exam objects and call fireAllRulesAndCalculateDecisionScore.
The drl never changes an Exam. Period and Room never change.
That results into this log:
Breakpoint reached at org.drools.solver.core.evaluation.EvaluationHandler.fireAllRulesAndCalculateDecisionScore(EvaluationHandler.java:90)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.reverse(SumAccumulateFunction.java:62)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.accumulate(SumAccumulateFunction.java:53)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.reverse(SumAccumulateFunction.java:62)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.accumulate(SumAccumulateFunction.java:53)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.reverse(SumAccumulateFunction.java:62)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.accumulate(SumAccumulateFunction.java:53)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.reverse(SumAccumulateFunction.java:62)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.accumulate(SumAccumulateFunction.java:53)
Breakpoint reached at org.drools.base.accumulators.SumAccumulateFunction.getResult(SumAccumulateFunction.java:70)
There are 4 accumulate calls and 4 reverse calls. That might be accurate.
There are 8 getResult calls, that should only be 1.
Also, the getResult method creates a new Double instance each time:
return new Double( data.total );
It might be better to do
return Double.valueOf(data.total);
to allow the JVM to hot spot optimize it.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 2 months
[JBoss JIRA] Created: (JBRULES-946) RuleBase serialization fails with StackOverflowError
by Alexandre Gattiker (JIRA)
RuleBase serialization fails with StackOverflowError
----------------------------------------------------
Key: JBRULES-946
URL: http://jira.jboss.com/jira/browse/JBRULES-946
Project: JBoss Rules
Issue Type: Bug
Security Level: Public (Everyone can see)
Affects Versions: 4.0.0.MR2
Reporter: Alexandre Gattiker
Assigned To: Mark Proctor
I created 920 rules, all in this format, with different values in the string literals.
rule "Rule_1"
when
$g:Parent()
$c:Child(parent==$g,code=="yyy1")
Item(parentObject==$c,name=="xxx1", value == "1")
Item(parentObject==$c,name=="xxx2",value == "1")
Item(parentObject==$g,name=="xxx3",value == "200")
then
System.out.println("2");
end
With the following code, I create a RuleBase that I serialize.
RuleBase ruleBase = RuleBaseFactory.newRuleBase();
PackageBuilderConfiguration conf = new PackageBuilderConfiguration();
conf.setCompiler(PackageBuilderConfiguration.JANINO);
Reader source = new InputStreamReader(PolicyServiceImpl.class.getResourceAsStream(fileName));
PackageBuilder builder = new PackageBuilder(conf);
builder.addPackageFromDrl(source);
Package pkg = builder.getPackage();
ruleBase.addPackage(pkg);
//serialize RuleBase
File outFile = new File(f.getPath() + ".tmp");
FileOutputStream fout = new FileOutputStream(outFile);
ObjectOutputStream oos = new ObjectOutputStream(fout);
oos.writeObject(ruleBase);
oos.close();
The serialization fails with the following exception. Otherwise the RuleBase is perfectly fine and working.
java.lang.StackOverflowError
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java(Compiled Code))
at java.util.ArrayList.writeObject(ArrayList.java(Compiled Code))
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
at java.lang.reflect.Method.invoke(Method.java(Compiled Code))
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java(Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java(Inlined Compiled Code))
at java.util.ArrayList.writeObject(ArrayList.java(Compiled Code))
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
at java.lang.reflect.Method.invoke(Method.java(Compiled Code))
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java(Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java(Inlined Compiled Code))
at java.util.ArrayList.writeObject(ArrayList.java(Compiled Code))
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
at java.lang.reflect.Method.invoke(Method.java(Compiled Code))
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java(Compiled Code))
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java(Inlined Compiled Code))
(last four lines repeated over 1000 times)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 2 months
[JBoss JIRA] Created: (JGRP-364) When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group
by Matthew Todd (JIRA)
When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group
-------------------------------------------------------------------------------------------------
Key: JGRP-364
URL: http://jira.jboss.com/jira/browse/JGRP-364
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4
Environment: linux 2.6 kernel x86_64 running java 1.5.0_06
Reporter: Matthew Todd
Assigned To: Bela Ban
I am testing a jgroups tcp_nio configuration using the draw demo.If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message:
org.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying
I am starting the draw demo like this;
java -cp jgroups-all.jar:commons-logging.jar:concurrent.jar:jmxri.jar org.jgroups.demos.Draw -props test.xml
Here is the configuration for one of my nodes:
<config>
<TCP_NIO
bind_addr="192.158.70.200"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
use_outgoing_packet_handler="true"
down_thread="false" up_thread="false"
enable_bundling="true"
start_port="7800"
end_port="7800"
use_send_queues="false"
sock_conn_timeout="300" skip_suspected_members="true"
/>
<MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"
bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>
<MERGE2 max_interval="100000"
down_thread="false" up_thread="false" min_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>
<VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
<pbcast.NAKACK max_xmit_size="60000"
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
down_thread="true" up_thread="true"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
down_thread="false" up_thread="false"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
down_thread="true" up_thread="true"
join_retry_timeout="2000" shun="true"
view_bundling="true"/>
<!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
min_threshold="0.10"/>
<FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
<pbcast.STATE_TRANSFER/>
<!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
</config>
Node 2 and 3 have the same configuration except the port they bind to has been changed
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 5 months
[JBoss JIRA] Created: (JGRP-356) TCP_NIO: failure starting correctly with bundling enabled
by Bela Ban (JIRA)
TCP_NIO: failure starting correctly with bundling enabled
---------------------------------------------------------
Key: JGRP-356
URL: http://jira.jboss.com/jira/browse/JGRP-356
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4
Reporter: Bela Ban
Assigned To: Vladimir Blagojevic
Fix For: 2.5
Stack is (default tcp-nio.xml with bundling enabled), the error is:
$ jg Draw -props ./tcp-nio.xml
-------------------------------------------------------
GMS: address is 127.0.0.1:7800
-------------------------------------------------------
** View=[127.0.0.1:7800|0] [127.0.0.1:7800]
0 [WARN] [TimeScheduler.Thread] TimeScheduler._run(): task org.jgroups.protocols.TP$Bundler$BundlingTimer@1c65216 took 20921ms to execute, please check why it is taking so long. It is delaying other tasks
<config>
<TCP_NIO
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
use_outgoing_packet_handler="false"
down_thread="false" up_thread="false"
enable_bundling="true"
start_port="7800"
use_send_queues="false"
sock_conn_timeout="300" skip_suspected_members="true"
reader_threads="8"
writer_threads="8"
processor_threads="8"
processor_minThreads="8"
processor_maxThreads="8"
processor_queueSize="100"
processor_keepAliveTime="-1"/>
<TCPPING timeout="3000"
down_thread="false" up_thread="false"
initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
port_range="1"
num_initial_members="3"/>
<MERGE2 max_interval="100000"
down_thread="false" up_thread="false" min_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>
<FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
<VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
<pbcast.NAKACK max_xmit_size="60000"
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
down_thread="false" up_thread="false"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
down_thread="false" up_thread="false"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
down_thread="false" up_thread="false"
join_retry_timeout="2000" shun="true"
view_bundling="true"/>
<FC max_credits="2000000" down_thread="false" up_thread="false"
min_threshold="0.10"/>
<FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
<pbcast.STREAMING_STATE_TRANSFER down_thread="false" up_thread="false"
use_flush="true" use_reading_thread="true"/>
<!-- pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/ -->
<pbcast.FLUSH down_thread="false" up_thread="false"/>
</config>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 5 months
[JBoss JIRA] Created: (JGRP-346) Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources.
by Stuart Jensen (JIRA)
Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources.
-------------------------------------------------------------------------------------------------------------------------------------------
Key: JGRP-346
URL: http://jira.jboss.com/jira/browse/JGRP-346
Project: JGroups
Issue Type: Bug
Affects Versions: 2.3 SP1
Environment: SUSE Linux 9
Reporter: Stuart Jensen
Assigned To: Bela Ban
To duplicate the issue:
1) Create a four member cluster using the following configuration: (two member clusters exhibit the problem as well, just not as exaggerated)
TCP(start_port=7801):
TCPPING(initial_hosts=<ip addresses go here>;port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
MERGE2(min_interval=5000;max_interval=10000):
FD(shun=true;timeout=2500;max_tries=5;up_thread=true;down_thread=true):
VERIFY_SUSPECT(timeout=2000;down_thread=false;up_thread=false):
pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
pbcast.GMS(join_timeout=5000;join_retry_timeout=3500;shun=true;print_local_addr=true;down_thread=true;up_thread=true)
2) I was running JGroups in a Tomcat servlet application. Start up the cluster. To determine the number of threads on Linux I executed the following commands:
ps -ef | grep tomcat
echo "" > catalina.out
kill -QUIT <pid from ps command above>
grep ".Sender \[" catalina.out | wc -l
You get the process id of Tomcat using the ps command. Then clear the content of the catalina.out file. The kill command causes the threads to be printed into the catalina.out file. Then the grep searches for and counts all of the "ConnectionTable.Connnection.Sender" threads that are currently active on the system.
3) Pick one of the cluster member boxes and pull the network cable out of the box such that all communication with the other three members is terminated.
4) After one or two minutes, replace the network cable.
5) Repeat the steps to determine the number of threads currently active on the system.
6) Repeat steps 3 through 5, each time watching the number of threads. Each iteration will cause more and more threads to be orphaned on the system. It seems to grow exponentially, after about 4 iterations we have around 300-400 Sender threads. The Receiver threads will be orphaned also in similar numbers.
After investigating the issue, I came up with the following "fix" which cleared the problem up.
In the file ConnectionTable.java there is a method called retainAll(). It appears that this method is called by the TCP protocol when a view change occurs. This method removes Connnections from the "Connection Pool" (member variable conns) but does not destroy them. We initially thought the reaper thread may clean them up, but since the Connection objects are actually removed from the Connection Pool, the reaper does not help the situation. As we watched our connections we noticed that the Connections orphaned by this routine were the ones filling up the system's set of threads. So, we added code to call destroy() on all of the Connection objects that retainAll() removes from the Connection Pool. The "diff" is provided below. Note that we did our change in the JGroups 2.3 SP1 file ConnectionTable.java. Scott Marlow did this diff for me, the same change, but applied to the BasicConnectionTable from the 2.4 source set.
Index: BasicConnectionTable.java
===================================================================
RCS file: /cvsroot/javagroups/JGroups/src/org/jgroups/blocks/BasicConnectionTable.java,v
retrieving revision 1.8
diff -r1.8 BasicConnectionTable.java
22,26c22
< import java.util.Map;
< import java.util.Iterator;
< import java.util.HashMap;
< import java.util.Vector;
< import java.util.Collection;
---
> import java.util.*;
263c259,289
< conns.keySet().retainAll(c);
---
> // conns.keySet().retainAll(c);
> ArrayList alConnsToDestroy = new ArrayList();
> synchronized(conns)
> {
> HashMap copy=new HashMap(conns);
> conns.keySet().retainAll(c);
> Set ks = copy.keySet();
> Iterator iter = ks.iterator();
> while (iter.hasNext())
> {
> Object oKey = iter.next();
> if (null == conns.get(oKey))
> { // This connection NOT in the resultant connection set
> Connection conn = (Connection)copy.get(oKey);
> if (null != conn)
> { // Destroy this connection
> alConnsToDestroy.add(conn);
> }
> }
> }
> }
> // All of the connections that were not retained must be destroyed
> // so that their resources are cleaned up.
> for (int a=0; a<alConnsToDestroy.size(); a++)
> {
> Connection conn = (Connection)alConnsToDestroy.get(a);
> if(log.isTraceEnabled())
> log.trace("Destroy this orphaned connection: " + conn);
> conn.destroy();
> }
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 5 months