Hi,
I am investigating performance of a Drools 5.4 stateful
knowledge session. This session has about 200 rules, 200k
facts and takes about 1 hour to run to completion. Looking at
the profile there is a hotspot that consumes almost 65% of the
cpu time: java.util.AbstractList.hashCode().
Here is the full stack:
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvoker.evaluate(KnowledgeHelper,
WorkingMemory)
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvokerGenerated.evaluate(KnowledgeHelper,
WorkingMemory)
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cb.defaultConsequence(KnowledgeHelper,
List, FactHandle, GradingFact, FactHandle, ReportNode,
FactHandle, WeightsHolder, FactHandle, Logger)
org.drools.base.DefaultKnowledgeHelper.update(FactHandle,
long)
org.drools.common.NamedEntryPoint.update(FactHandle,
Object, long, Activation)
org.drools.common.NamedEntryPoint.update(FactHandle,
Object, long, Activation)
org.drools.common.PropagationContextImpl.evaluateActionQueue(InternalWorkingMemory)
org.drools.reteoo.ReteooWorkingMemory$EvaluateResultConstraints.execute(InternalWorkingMemory)
org.drools.reteoo.AccumulateNode.evaluateResultConstraints(AccumulateNode$ActivitySource,
LeftTuple, PropagationContext, InternalWorkingMemory,
AccumulateNode$AccumulateMemory,
AccumulateNode$AccumulateContext, boolean)
org.drools.common.DefaultFactHandle.setObject(Object)
java.util.AbstractList.hashCode()
I believe the following clues can be extracted:
- "Rule_Set_weights" was fired and a fact was modified
(confirmed by examining the rule definition)
- The fact modification caused the pre-conditions for other
rules to be computed.
- One of these rules has an accumulate condition that
accumulates into an AbstractList.
- This list is very very large. So large that looping
through the elements in the list and aggregating the hashCode
of individual elements dominates execution time (the
individual element hashCode doesn't even show up in the
profile… either its very fast or maybe its identify hashCode
which the profiler might filter?).
- Accumulate is either working on a large set of data or
the same accumulate is evaluated many many times.
Is my analysis correct? Are there clues that I am missing?
I have 15 rules that use accumulate… However none accumulate
with a result of List. Most accumulate using sum() and count()
(result of Number). A few use collectSet(). A few more
aggregate into a result with a custom type.
A few other notes:
- All accumulate conditions are the last condition in the
WHEN clause.
- I use agenda groups to separate fact processing into
phases. Rules that accumulate are in a separate agenda group
from rules that modify/insert facts that are used in
accumulation. I hope this prevents the accumulate condition
from being evaluated until all the rules that modify the facts
accumulate needs are done firing. I suspect this may not be
working as I expect. I haven't put together an example to
investigate.
- When accumulating into a set, the rule condition looks
like this:
$factName : Set() from accumulate( FactMatch( $field :
field ), collectionSet( $field ) )
How can I narrow down this further?
Are there any general rules to follow to optimize use of
accumulate in conditions?
Thanks,
Ryan