[rules-users] Object size affect on session insertion performance

Wolfgang Laun wolfgang.laun at gmail.com
Sun Feb 23 10:03:57 EST 2014


Using OPs rule:

5.5.0
5000 facts, hashCode computed 12507500 times

6.0.0
5000 facts, hashCode computed 10000 times

After 5000 insertions and before fireAllRules, there were 5000
computations of hashCode. After the Engine is started, another 5000
computations occur.
This might bear investigation, although O(n) ist much better than O(n^2).

-W


On 23/02/2014, Mark Proctor <mproctor at codehaus.org> wrote:
> Please do report back to the list if 6.0 alleviates this problem, without
> needing a blocking fact.
>
> Mark
> On 23 Feb 2014, at 13:30, Mark Proctor <mproctor at codehaus.org> wrote:
>
>> "I have seen a few related posts on what I have found, the #hashCode of
>> my
>> ComplexClass is taking nearly all of the time.
>> Upon further investigation, I found that this is happening because every
>> insert to the session is causing the `result` portion of the accumulate
>> to
>> "recalculate".  "
>>
>> There isn't much we can do about your has code performance time. But if
>> you upgrade to 6.0, the batch oriented propagation will reduce how often
>> this is done, and hopefully minimise the difference. You won't then need a
>> blocking fact.
>>
>> Mark
>> On 23 Feb 2014, at 03:38, mikerod <mjr4184 at gmail.com> wrote:
>>
>>> @laune
>>>
>>> You are correct that I actually put an incorrect time up before.  Thanks
>>> for
>>> pointing that out and sorry for the confusion.
>>> The behavioral difference I have found was actually much large between
>>> the 2
>>> classes, SimpleClass and ComplexClass, than I originally
>>> thought.
>>>
>>> The SimpleClass accumulate is very quick, around ~300 ms.  The
>>> ComplexClass
>>> accumulate (with the exact same rule beyond the object type)
>>> spikes to around ~140 *seconds*.  In both cases this is with either 5K
>>> objects of one type inserted.
>>>
>>> @Mark
>>>
>>> I am sure that it is not the object creation time.  I create all of the
>>> objects before the timers and they are not lazy in any initialization.
>>> However, you were right that I needed to run some profiling on this to
>>> dig
>>> into the real issue.
>>>
>>> To start off, the culprit for this issue is the accumulate.  A rule
>>> without
>>> it like:
>>> ```
>>> rule "not collecting"
>>> 	when
>>> 	ComplexClass() ; swap for SimpleClass on another run
>>> 	then
>>> 	System.out.println("Done");
>>> end
>>>
>>> ```
>>> runs about the same, no matter if it has a SimpleClass or ComplexClass.
>>>
>>> Also, I'd like to just clarify, a SimpleClass here is just a class with
>>> 2
>>> Integer fields (for this example).
>>> However, the ComplexClass has around 15 fields and about half of these
>>> are
>>> Collections (aggregate) types with more nested classes underneath.
>>> This is the difference I mean between "simple" and "complex" in a class;
>>> if
>>> that wasn't clear before.
>>>
>>> Furthermore, there is only a single rule with only this very simple,
>>> contrived LHS logic in my example.  Drools is not needing
>>> to traverse any of the objects and no additional work is done.  This is
>>> purely just a single rule being evaluated during an insert.
>>> This is Drools v5.5.0.Final in this specific example (sorry for not
>>> mentioning that before).
>>>
>>> ---
>>>
>>> I have seen a few related posts on what I have found, the #hashCode of
>>> my
>>> ComplexClass is taking nearly all of the time.
>>> Upon further investigation, I found that this is happening because every
>>> insert to the session is causing the `result` portion of the accumulate
>>> to
>>> "recalculate".  During this step, the AccumulateContext `result`
>>> RightTuple
>>> is having its FactHandle reset to the newly calculated result.
>>> This calls the #hashCode of the Collection that is holding all of the
>>> current ComplexClass object instances; and Collection calls the #hashCode
>>> of
>>> each of these (in j.u.Collection impl's such as j.u.AbstractList and
>>> j.u.AbstractSet).
>>>
>>> So, I have a Collection, that is increasingly growing with ComplexClass
>>> object instances, and each time it grows by one, the #hashCode of the
>>> entire
>>> Collection of ComplexClass objects is being calculated.
>>>
>>> The ComplexClass #hashCode is an aggregate of a recursive walk along
>>> across
>>> all of the objects' #hashCode it reaches through its fields, just like
>>> many
>>> aggregate types.  I think I can see that this could be expensive if this
>>> is
>>> being calculated for nearly 5K objects as each of the final objects are
>>> inserted causing the `result` recalculation.
>>>
>>> ---
>>>
>>> I do realize that one potential workaround would be to put a blocking
>>> constraint above the accumulate:
>>> ```
>>> rule "collecting"
>>> 	when
>>> 	BlockingFactClass()
>>> 	$res : Collection() from
>>> 			accumulate( $s : ComplexClass(),
>>> 				init( Collection c = new ArrayList(); ),
>>> 				action( c.add($s); ),
>>> 				result( c ))
>>> 				
>>> 	then
>>> 	System.out.println("Done");
>>> end
>>> ```
>>> where the BlockingFactClass is not inserted until *after* all of the
>>> ComplexClass objects.  This speeds up the performance significantly; the
>>> time is nearly the same as the SimpleClass run actually.
>>>
>>> ---
>>>
>>> I found that this was an interesting discovery and I did not expect this
>>> behavior.
>>>
>>> So @Mark it does seem (to me) that a deeply nested ComplexClass can hurt
>>> performance on an AccumulateNode when the `result` can be repeatedly
>>> calculated; even when the `result` is not "doing" anything besides
>>> returning
>>> what has been accumulated/collected.  I understand this is probably just
>>> a
>>> "gotcha" that I have to deal with.  This behavior is also the same for
>>> the
>>> Drools `collect` functionality, which I think just uses accumulate in
>>> the
>>> impl anyways (perhaps I'm incorrect).  Also, I note that this isn't
>>> necessary a direct "Object size affecting session insertion performance",
>>> as
>>> I originally titled this thread.
>>>
>>> I also think that the new Phreak-based impl for Drools in v6.x may not
>>> behave like this anymore, since it is more lazy and delays work more
>>> until
>>> firing rules (an assumption here; haven't tested that).
>>>
>>> With that said, I'm open to anymore suggestions about how to avoid this
>>> issue in pre-Phreak Drools (v6.x)
>>> (I am not sure how long until I am able to make that jump in version.).
>>>
>>>
>>> Also, I'm open to be corrected if my findings are incorrect/incomplete.
>>> :)
>>>
>>> Thanks again for the feedback!  It is helpful.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://drools.46999.n3.nabble.com/Object-size-impact-on-session-insertion-performance-tp4028244p4028251.html
>>> Sent from the Drools: User forum mailing list archive at Nabble.com.
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>


More information about the rules-users mailing list