[jboss-jira] [JBoss JIRA] (DROOLS-766) RuleNetworkEvaluator infinite loop when doUpdatesReorderLeftMemory is called

Dmitry Toptygin (JIRA) issues at jboss.org
Tue May 19 20:55:19 EDT 2015


    [ https://issues.jboss.org/browse/DROOLS-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069429#comment-13069429 ] 

Dmitry Toptygin commented on DROOLS-766:
----------------------------------------

I just hit the same issue. It happens unpredictably - sometimes several hours pass with no issue (with high traffic of events flowing through the rule engine), and then suddenly it happens.
Here are the stack traces:

"ruleEngineHolderThread-customer-13-1432069918078" daemon prio=10 tid=0x00007fd22927c000 nid=0x26b7 runnable [0x00007fd215be6000]
   java.lang.Thread.State: RUNNABLE
        at org.drools.core.util.index.LeftTupleList.remove(LeftTupleList.java:144)
        at org.drools.core.phreak.RuleNetworkEvaluator.doUpdatesReorderLeftMemory(RuleNetworkEvaluator.java:793)
        at org.drools.core.phreak.PhreakJoinNode.doNode(PhreakJoinNode.java:38)
        at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:547)
        at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:533)
        at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:334)
        at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:161)
        at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:116)
        at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:235)
        - locked <0x000000008f2f0f48> (a org.drools.core.phreak.RuleExecutor)
        at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:106)
        - locked <0x000000008f2f0f48> (a org.drools.core.phreak.RuleExecutor)
        at org.drools.core.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1016)
        at org.drools.core.common.DefaultAgenda.fireUntilHalt(DefaultAgenda.java:1258)
        at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1333)
        at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1310)
        at com.whizcontrol.ruleengine.agent.drools.DroolsAgent$1.run(DroolsAgent.java:224)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

===
and another one: 
"ruleEngineHolderThread-customer-51-1432069932073" daemon prio=10 tid=0x00007fd1dce63800 nid=0x26df runnable [0x00007fd1d3efd000]
   java.lang.Thread.State: RUNNABLE
        at org.drools.core.phreak.RuleNetworkEvaluator.doUpdatesReorderLeftMemory(RuleNetworkEvaluator.java:795)
        at org.drools.core.phreak.PhreakJoinNode.doNode(PhreakJoinNode.java:38)
        at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:547)
        at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:533)
        at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:334)
        at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:161)
        at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:116)
        at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:235)
        - locked <0x000000008fecd870> (a org.drools.core.phreak.RuleExecutor)
        at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:106)
        - locked <0x000000008fecd870> (a org.drools.core.phreak.RuleExecutor)
        at org.drools.core.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1016)
        at org.drools.core.common.DefaultAgenda.fireUntilHalt(DefaultAgenda.java:1258)
        at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1333)
        at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1310)
        at com.whizcontrol.ruleengine.agent.drools.DroolsAgent$1.run(DroolsAgent.java:224)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

==
As far as providing DRL - there's lots of rules, difficult to isolate. But I think (cannot be 100% sure though) the problem started to happen after we introduced the following rule:

rule "update time every 5 seconds"
   timer ( int: 5s 5s )	
   when
        $timeFact : TimeFact()
    then
        modify($timeFact){
            setValue($timeFact.getCurrentTime());
        };
end

Some other information about our environment:
- our rule agents are long running
- we extensively use time-based windows and collect/accumulate statements
- the reason why we introduced TimeFact is to eliminate "flickering" when one rule says "turn on light because certain threshold has been breached" and another rule says "turn off light because things returned to normal", we use TimeFact in the conditions of many rules, and it is meant to be self-updating.

When the problem happens, it completely takes over one CPU core.

> RuleNetworkEvaluator infinite loop when doUpdatesReorderLeftMemory is called
> ----------------------------------------------------------------------------
>
>                 Key: DROOLS-766
>                 URL: https://issues.jboss.org/browse/DROOLS-766
>             Project: Drools
>          Issue Type: Bug
>          Components: core engine
>    Affects Versions: 6.2.0.Final
>            Reporter: Juan Carlos Garcia
>            Assignee: Mario Fusco
>
> We are migrating our system to use drools 6.2.0.Final from 6.0.1.Final and one of our testcase which actually simulate several complex scenario is making drools (6.2.0.Final) going into an endless loop, while the same works fine in 6.0.1.Final.
> I took a thread dump and it seems stuck:
> {code}
> 	at org.drools.core.phreak.RuleNetworkEvaluator.doUpdatesReorderLeftMemory(RuleNetworkEvaluator.java:795)
> {code}
> While debugging the source code this is what i got:
> # LeftTupleMemory ltm = bm.getLeftTupleMemory(); 
> ** LeftTupleMemory is a LeftTupleList
> # leftTuple.getStagedNext(); 
> ** Always return the same object instance(leftTuple is a JoinNodeLeftTuple type), hence the loop never ends.
> Unfortunately i cannot provide the DRL files + domain classes involve and don't know if it would be even possible for me right now to recreate the same testcase without leaking internal information from the company i work for.
> ThreadDump
> {code}
> "Attach Listener" daemon prio=10 tid=0x00007fdb74001000 nid=0x88c waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "drools-worker-2" daemon prio=10 tid=0x00007fdb54003000 nid=0x803 waiting on condition [0x00007fdb9b7cc000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000783f213f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> "pool-3-thread-1" prio=10 tid=0x00007fdba46de000 nid=0x764 waiting on condition [0x00007fdb9bbd6000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000783eba160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> "Service Thread" daemon prio=10 tid=0x00007fdba40ad800 nid=0x754 runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=10 tid=0x00007fdba40ab000 nid=0x753 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=10 tid=0x00007fdba40a8000 nid=0x752 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "JDWP Command Reader" daemon prio=10 tid=0x00007fdb68001000 nid=0x74e runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "JDWP Event Helper Thread" daemon prio=10 tid=0x00007fdba40a6000 nid=0x74d runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "JDWP Transport Listener: dt_socket" daemon prio=10 tid=0x00007fdba40a2800 nid=0x74c runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007fdba4094800 nid=0x74b runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007fdba4074800 nid=0x74a in Object.wait() [0x00007fdba0cfb000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000007838824e0> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> 	- locked <0x00000007838824e0> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
> "Reference Handler" daemon prio=10 tid=0x00007fdba4072800 nid=0x749 in Object.wait() [0x00007fdba0dfc000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000007838820a0> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:503)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> 	- locked <0x00000007838820a0> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007fdba400e800 nid=0x743 runnable [0x00007fdbad9a8000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.drools.core.phreak.RuleNetworkEvaluator.doUpdatesReorderLeftMemory(RuleNetworkEvaluator.java:795)
> 	at org.drools.core.phreak.PhreakJoinNode.doNode(PhreakJoinNode.java:38)
> 	at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:547)
> 	at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:533)
> 	at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:334)
> 	at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:161)
> 	at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:116)
> 	at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:235)
> 	- locked <0x0000000783f2da90> (a org.drools.core.phreak.RuleExecutor)
> 	at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:106)
> 	- locked <0x0000000783f2da90> (a org.drools.core.phreak.RuleExecutor)
> 	at org.drools.core.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1016)
> 	at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1302)
> 	at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1289)
> 	at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1262)
> 	at org.drools.core.command.runtime.rule.FireAllRulesCommand.execute(FireAllRulesCommand.java:109)
> 	at org.drools.core.command.runtime.rule.FireAllRulesCommand.execute(FireAllRulesCommand.java:34)
> 	at org.drools.core.command.runtime.BatchExecutionCommandImpl.execute(BatchExecutionCommandImpl.java:155)
> 	at org.drools.core.command.runtime.BatchExecutionCommandImpl.execute(BatchExecutionCommandImpl.java:76)
> 	at org.drools.core.impl.StatefulKnowledgeSessionImpl.execute(StatefulKnowledgeSessionImpl.java:723)
> 	at org.drools.core.impl.StatefulKnowledgeSessionImpl.execute(StatefulKnowledgeSessionImpl.java:697)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the jboss-jira mailing list