[rules-users] Rule performance with accumulate

Wed Jan 16 02:32:54 EST 2013

It has to be the modify frequency that causes poor performance
with the first solution, since this is implemented as a retract
and insert, the first of which should cause a sequential search
or, at least, a shift in the List due to List.remove().

It would be interesting to compare this to an accumulate
that uses collectSet, provided that the A-collection can be
maintained as a set. This one should have O(1) access time.

And I suspect that your new Bucket doesn't use a List!

-W

On 15/01/2013, Ryan Crumley <crumley at gmail.com> wrote:
> The previous rule looked something like this (simplified version):
>
> rule "Poor Performing Rule"
> agenda-group "Summarize"
> when
>    $a : A( …, $category : category  )
>    $all : List() from collect( A( category == $category )
> then
>    modify( > In this case "collect" is used is to generate a list of all A facts which
> need to be used in "updateSummary". This performed poorly because A facts
> are inserted and modified frequently in earlier agenda-groups. I now
> appreciate agenda-groups only affect when a rule fires... Not when the rule
> is evaluated! Since facts that match $a are changed frequently the collect(
> … ) was evaluated A LOT and caused poor performance.
>
> The ruleset I am working on has a concept of grouping facts into "buckets"
> of other facts with similar qualities. So I add a new rule to keep track of
> A facts:
>
> rule "Add A Facts into Bucket by Category"
> agenda-group "GroupThatComesBeforeSummarize"
> no-loop
> when
>    $a : A( …, $category : category  )
>    $bucket : Bucket( … )
> then
>    modify( $bucket ) { add( $category, $a )  }
>
> Then I modified the original rule to use this Bucket( … ) instead of
> collect( .. ) :
>
> rule "Better Performing Rule"
> agenda-group "Summarize"
> when
>    $a : A( …, $category : category  )
>    $bucket : Bucket( … )
> then
>    modify( $a ) { updateSummary( …, $bucket.getAll( $category) ) }
>
> The two rules need to be in different agenda-groups to ensure all of the
> A(…) objects are added to the bucket before the bucket is used to generate
> summaries.
>
>
> Hope this helps someone.
>
> Ryan
>
>
> On Tue, Jan 15, 2013 at 11:03 AM, Geoffrey De Smet
> <ge0ffrey.spam at gmail.com>wrote:
>
>>  Interesting!
>>
>> Might you have a code example how your code looks like before and after
>> these changes?
>>
>> Op 15-01-13 16:27, Ryan Crumley schreef:
>>
>> Update:
>>
>>  Turns out I was looking in the wrong place. All along I had been looking
>> for a rule with accumulate based on AccumulateNode showing up in the
>> stack
>> trace. By setting a conditional breakpoint in AccumulateNode that only
>> breaks when the result is AbstractList (I knew AbstractList.hashCode was
>> causing the performance problem) I was able to find the rule that caused
>> the problem...
>>
>>  The problem rule actually used "collect" not "accumulate"! Appears that
>> "collect" is implemented using "accumulate" under the covers. This
>> explains
>> why I had trouble narrowing down my search.
>>
>>  In the problem rule "collect" was the last condition in WHEN.
>> Additionally the first condition matches a fact that is not inserted
>> until
>> late in rule processing. It was clear from my investigation the work
>> associated with "collect" was happening much more frequently than I had
>> expected (maybe even as often as every fact inserted).
>>
>>  Once I found the problem rule it was straightforward to accomplish the
>> same effect without a "collect". In the heavy workload use case (hundreds
>> of thousand facts) this resulted in a 99% performance improvement.
>>
>>  I have not had the chance to create a simple rule set to reproduce the
>> problem so I can understand WHY this rule was so detrimental to
>> performance. For now I am happy reporting to my team the performance
>> issue
>> is fixed and to be very careful when using collect and accumulate in the
>> future.
>>
>>  Thanks for the pointers.
>>
>>
>>
>> On Wed, Jan 9, 2013 at 10:38 AM, Ryan Crumley <crumley at gmail.com> wrote:
>>
>>> Thanks Geoffrey.
>>>
>>>  I have a few rules that have two accumulates. Upgrading to 5.5
>>> shouldn't be a problem so I will give that a try and see if it helps.
>>> Thanks for the tip!
>>>
>>>
>>> On Wed, Jan 9, 2013 at 8:37 AM, Geoffrey De Smet
>>> <ge0ffrey.spam at gmail.com
>>> > wrote:
>>>
>>>>  I 've also seen that accumulate and statefull don't always mix as good
>>>> as they could.
>>>> The new algorithm for 6.0 sounds promising to improve this (with "set
>>>> based propagation").
>>>>
>>>> Do you have any rule that has 2 accumulates in 1 rule?
>>>> That used to kill my statefull performance (3 times slower etc), but a
>>>> recent experiment with 5.5 showed that that's no longer the case IIRC.
>>>>
>>>> Op 09-01-13 15:09, Ryan Crumley schreef:
>>>>
>>>>   Hi,
>>>>
>>>>  I am investigating performance of a Drools 5.4 stateful knowledge
>>>> session. This session has about 200 rules, 200k facts and takes about 1
>>>> hour to run to completion. Looking at the profile there is a hotspot
>>>> that
>>>> consumes almost 65% of the cpu time: java.util.AbstractList.hashCode().
>>>>
>>>>  Here is the full stack:
>>>>
>>>>
>>>>
>>>> com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvoker.evaluate(KnowledgeHelper,
>>>> WorkingMemory)
>>>>
>>>>
>>>> com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvokerGenerated.evaluate(KnowledgeHelper,
>>>> WorkingMemory)
>>>>
>>>>
>>>> com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cb.defaultConsequence(KnowledgeHelper,
>>>> List, FactHandle, GradingFact, FactHandle, ReportNode, FactHandle,
>>>> WeightsHolder, FactHandle, Logger)
>>>>    org.drools.base.DefaultKnowledgeHelper.update(FactHandle, long)
>>>>    org.drools.common.NamedEntryPoint.update(FactHandle, Object, long,
>>>> Activation)
>>>>    org.drools.common.NamedEntryPoint.update(FactHandle, Object, long,
>>>> Activation)
>>>>
>>>>
>>>> org.drools.common.PropagationContextImpl.evaluateActionQueue(InternalWorkingMemory)
>>>>
>>>>
>>>> org.drools.reteoo.ReteooWorkingMemory$EvaluateResultConstraints.execute(InternalWorkingMemory)
>>>>
>>>>
>>>> org.drools.reteoo.AccumulateNode.evaluateResultConstraints(AccumulateNode$ActivitySource,
>>>> LeftTuple, PropagationContext, InternalWorkingMemory,
>>>> AccumulateNode$AccumulateMemory, AccumulateNode$AccumulateContext,
>>>> boolean)
>>>>    org.drools.common.DefaultFactHandle.setObject(Object)
>>>>    java.util.AbstractList.hashCode()
>>>>
>>>>  I believe the following clues can be extracted:
>>>>
>>>>  - "Rule_Set_weights" was fired and a fact was modified (confirmed by
>>>> examining the rule definition)
>>>> - The fact modification caused the pre-conditions for other rules to be
>>>> computed.
>>>> - One of these rules has an accumulate condition that accumulates into
>>>> an AbstractList.
>>>> - This list is very very large. So large that looping through the
>>>> elements in the list and aggregating the hashCode of individual
>>>> elements
>>>> dominates execution time (the individual element hashCode doesn't even
>>>> show
>>>> up in the profile… either its very fast or maybe its identify hashCode
>>>> which the profiler might filter?).
>>>> - Accumulate is either working on a large set of data or the same
>>>> accumulate is evaluated many many times.
>>>>
>>>>  Is my analysis correct? Are there clues that I am missing?
>>>>
>>>>  I have 15 rules that use accumulate… However none accumulate with a
>>>> result of List. Most accumulate using sum() and count() (result of
>>>> Number).
>>>> A few use collectSet(). A few more aggregate into a result with a
>>>> custom
>>>> type.
>>>>
>>>>  A few other notes:
>>>> - All accumulate conditions are the last condition in the WHEN clause.
>>>> - I use agenda groups to separate fact processing into phases. Rules
>>>> that accumulate are in a separate agenda group from rules that
>>>> modify/insert facts that are used in accumulation. I hope this prevents
>>>> the
>>>> accumulate condition from being evaluated until all the rules that
>>>> modify
>>>> the facts accumulate needs are done firing. I suspect this may not be
>>>> working as I expect. I haven't put together an example to investigate.
>>>> - When accumulating into a set, the rule condition looks like this:
>>>>     $factName : Set() from accumulate( FactMatch( $field : field ),
>>>> collectionSet( $field ) )
>>>>
>>>>  How can I narrow down this further?
>>>>
>>>>  Are there any general rules to follow to optimize use of accumulate in
>>>> conditions?
>>>>
>>>>  Thanks,
>>>>
>>>>  Ryan
>>>>
>>>>
>>>>  _______________________________________________
>>>> rules-users mailing
>>>> listrules-users at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> rules-users mailing list
>>>> rules-users at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> rules-users mailing
>> listrules-users at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>