It has to be the modify frequency that causes poor performance
with the first solution, since this is implemented as a retract
and insert, the first of which should cause a sequential search
or, at least, a shift in the List due to List.remove().
It would be interesting to compare this to an accumulate
that uses collectSet, provided that the A-collection can be
maintained as a set. This one should have O(1) access time.
And I suspect that your new Bucket doesn't use a List!
-W
On 15/01/2013, Ryan Crumley <crumley(a)gmail.com> wrote:
The previous rule looked something like this (simplified version):
rule "Poor Performing Rule"
agenda-group "Summarize"
when
$a : A( …, $category : category )
$all : List() from collect( A( category == $category )
then
modify( > In this case "collect" is used is to generate a list of all A
facts which
need to be used in "updateSummary". This performed poorly because A facts
are inserted and modified frequently in earlier agenda-groups. I now
appreciate agenda-groups only affect when a rule fires... Not when the rule
is evaluated! Since facts that match $a are changed frequently the collect(
… ) was evaluated A LOT and caused poor performance.
The ruleset I am working on has a concept of grouping facts into "buckets"
of other facts with similar qualities. So I add a new rule to keep track of
A facts:
rule "Add A Facts into Bucket by Category"
agenda-group "GroupThatComesBeforeSummarize"
no-loop
when
$a : A( …, $category : category )
$bucket : Bucket( … )
then
modify( $bucket ) { add( $category, $a ) }
Then I modified the original rule to use this Bucket( … ) instead of
collect( .. ) :
rule "Better Performing Rule"
agenda-group "Summarize"
when
$a : A( …, $category : category )
$bucket : Bucket( … )
then
modify( $a ) { updateSummary( …, $bucket.getAll( $category) ) }
The two rules need to be in different agenda-groups to ensure all of the
A(…) objects are added to the bucket before the bucket is used to generate
summaries.
Hope this helps someone.
Ryan
On Tue, Jan 15, 2013 at 11:03 AM, Geoffrey De Smet
<ge0ffrey.spam(a)gmail.com>wrote:
> Interesting!
>
> Might you have a code example how your code looks like before and after
> these changes?
>
> Op 15-01-13 16:27, Ryan Crumley schreef:
>
> Update:
>
> Turns out I was looking in the wrong place. All along I had been looking
> for a rule with accumulate based on AccumulateNode showing up in the
> stack
> trace. By setting a conditional breakpoint in AccumulateNode that only
> breaks when the result is AbstractList (I knew AbstractList.hashCode was
> causing the performance problem) I was able to find the rule that caused
> the problem...
>
> The problem rule actually used "collect" not "accumulate"!
Appears that
> "collect" is implemented using "accumulate" under the covers.
This
> explains
> why I had trouble narrowing down my search.
>
> In the problem rule "collect" was the last condition in WHEN.
> Additionally the first condition matches a fact that is not inserted
> until
> late in rule processing. It was clear from my investigation the work
> associated with "collect" was happening much more frequently than I had
> expected (maybe even as often as every fact inserted).
>
> Once I found the problem rule it was straightforward to accomplish the
> same effect without a "collect". In the heavy workload use case (hundreds
> of thousand facts) this resulted in a 99% performance improvement.
>
> I have not had the chance to create a simple rule set to reproduce the
> problem so I can understand WHY this rule was so detrimental to
> performance. For now I am happy reporting to my team the performance
> issue
> is fixed and to be very careful when using collect and accumulate in the
> future.
>
> Thanks for the pointers.
>
>
>
> On Wed, Jan 9, 2013 at 10:38 AM, Ryan Crumley <crumley(a)gmail.com> wrote:
>
>> Thanks Geoffrey.
>>
>> I have a few rules that have two accumulates. Upgrading to 5.5
>> shouldn't be a problem so I will give that a try and see if it helps.
>> Thanks for the tip!
>>
>>
>> On Wed, Jan 9, 2013 at 8:37 AM, Geoffrey De Smet
>> <ge0ffrey.spam(a)gmail.com
>> > wrote:
>>
>>> I 've also seen that accumulate and statefull don't always mix as
good
>>> as they could.
>>> The new algorithm for 6.0 sounds promising to improve this (with "set
>>> based propagation").
>>>
>>> Do you have any rule that has 2 accumulates in 1 rule?
>>> That used to kill my statefull performance (3 times slower etc), but a
>>> recent experiment with 5.5 showed that that's no longer the case IIRC.
>>>
>>> Op 09-01-13 15:09, Ryan Crumley schreef:
>>>
>>> Hi,
>>>
>>> I am investigating performance of a Drools 5.4 stateful knowledge
>>> session. This session has about 200 rules, 200k facts and takes about 1
>>> hour to run to completion. Looking at the profile there is a hotspot
>>> that
>>> consumes almost 65% of the cpu time: java.util.AbstractList.hashCode().
>>>
>>> Here is the full stack:
>>>
>>>
>>>
>>>
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvoker.evaluate(KnowledgeHelper,
>>> WorkingMemory)
>>>
>>>
>>>
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cbDefaultConsequenceInvokerGenerated.evaluate(KnowledgeHelper,
>>> WorkingMemory)
>>>
>>>
>>>
com.company.rules.engine.Rule_Set_weights_08b44ce519a74b58ab3f85735b2987cb.defaultConsequence(KnowledgeHelper,
>>> List, FactHandle, GradingFact, FactHandle, ReportNode, FactHandle,
>>> WeightsHolder, FactHandle, Logger)
>>> org.drools.base.DefaultKnowledgeHelper.update(FactHandle, long)
>>> org.drools.common.NamedEntryPoint.update(FactHandle, Object, long,
>>> Activation)
>>> org.drools.common.NamedEntryPoint.update(FactHandle, Object, long,
>>> Activation)
>>>
>>>
>>>
org.drools.common.PropagationContextImpl.evaluateActionQueue(InternalWorkingMemory)
>>>
>>>
>>>
org.drools.reteoo.ReteooWorkingMemory$EvaluateResultConstraints.execute(InternalWorkingMemory)
>>>
>>>
>>>
org.drools.reteoo.AccumulateNode.evaluateResultConstraints(AccumulateNode$ActivitySource,
>>> LeftTuple, PropagationContext, InternalWorkingMemory,
>>> AccumulateNode$AccumulateMemory, AccumulateNode$AccumulateContext,
>>> boolean)
>>> org.drools.common.DefaultFactHandle.setObject(Object)
>>> java.util.AbstractList.hashCode()
>>>
>>> I believe the following clues can be extracted:
>>>
>>> - "Rule_Set_weights" was fired and a fact was modified (confirmed
by
>>> examining the rule definition)
>>> - The fact modification caused the pre-conditions for other rules to be
>>> computed.
>>> - One of these rules has an accumulate condition that accumulates into
>>> an AbstractList.
>>> - This list is very very large. So large that looping through the
>>> elements in the list and aggregating the hashCode of individual
>>> elements
>>> dominates execution time (the individual element hashCode doesn't even
>>> show
>>> up in the profile… either its very fast or maybe its identify hashCode
>>> which the profiler might filter?).
>>> - Accumulate is either working on a large set of data or the same
>>> accumulate is evaluated many many times.
>>>
>>> Is my analysis correct? Are there clues that I am missing?
>>>
>>> I have 15 rules that use accumulate… However none accumulate with a
>>> result of List. Most accumulate using sum() and count() (result of
>>> Number).
>>> A few use collectSet(). A few more aggregate into a result with a
>>> custom
>>> type.
>>>
>>> A few other notes:
>>> - All accumulate conditions are the last condition in the WHEN clause.
>>> - I use agenda groups to separate fact processing into phases. Rules
>>> that accumulate are in a separate agenda group from rules that
>>> modify/insert facts that are used in accumulation. I hope this prevents
>>> the
>>> accumulate condition from being evaluated until all the rules that
>>> modify
>>> the facts accumulate needs are done firing. I suspect this may not be
>>> working as I expect. I haven't put together an example to investigate.
>>> - When accumulating into a set, the rule condition looks like this:
>>> $factName : Set() from accumulate( FactMatch( $field : field ),
>>> collectionSet( $field ) )
>>>
>>> How can I narrow down this further?
>>>
>>> Are there any general rules to follow to optimize use of accumulate in
>>> conditions?
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>>
>>> _______________________________________________
>>> rules-users mailing
>>>
listrules-users@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>
>
>
> _______________________________________________
> rules-users mailing
> listrules-users@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/rules-users
>
>