[rules-users] Object size affect on session insertion performance

Mark Proctor mproctor at codehaus.org
Sun Feb 23 08:47:16 EST 2014


Please do report back to the list if 6.0 alleviates this problem, without needing a blocking fact.

Mark
On 23 Feb 2014, at 13:30, Mark Proctor <mproctor at codehaus.org> wrote:

> "I have seen a few related posts on what I have found, the #hashCode of my
> ComplexClass is taking nearly all of the time.
> Upon further investigation, I found that this is happening because every
> insert to the session is causing the `result` portion of the accumulate to
> "recalculate”.  “
> 
> There isn’t much we can do about your has code performance time. But if you upgrade to 6.0, the batch oriented propagation will reduce how often this is done, and hopefully minimise the difference. You won’t then need a blocking fact.
> 
> Mark
> On 23 Feb 2014, at 03:38, mikerod <mjr4184 at gmail.com> wrote:
> 
>> @laune
>> 
>> You are correct that I actually put an incorrect time up before.  Thanks for
>> pointing that out and sorry for the confusion.
>> The behavioral difference I have found was actually much large between the 2
>> classes, SimpleClass and ComplexClass, than I originally 
>> thought.
>> 
>> The SimpleClass accumulate is very quick, around ~300 ms.  The ComplexClass
>> accumulate (with the exact same rule beyond the object type)
>> spikes to around ~140 *seconds*.  In both cases this is with either 5K
>> objects of one type inserted.
>> 
>> @Mark 
>> 
>> I am sure that it is not the object creation time.  I create all of the
>> objects before the timers and they are not lazy in any initialization.
>> However, you were right that I needed to run some profiling on this to dig
>> into the real issue.
>> 
>> To start off, the culprit for this issue is the accumulate.  A rule without
>> it like:
>> ```
>> rule "not collecting"
>> 	when
>> 	ComplexClass() ; swap for SimpleClass on another run
>> 	then
>> 	System.out.println("Done");
>> end
>> 
>> ```
>> runs about the same, no matter if it has a SimpleClass or ComplexClass.
>> 
>> Also, I'd like to just clarify, a SimpleClass here is just a class with 2
>> Integer fields (for this example). 
>> However, the ComplexClass has around 15 fields and about half of these are
>> Collections (aggregate) types with more nested classes underneath.
>> This is the difference I mean between "simple" and "complex" in a class; if
>> that wasn't clear before.
>> 
>> Furthermore, there is only a single rule with only this very simple,
>> contrived LHS logic in my example.  Drools is not needing
>> to traverse any of the objects and no additional work is done.  This is
>> purely just a single rule being evaluated during an insert.
>> This is Drools v5.5.0.Final in this specific example (sorry for not
>> mentioning that before).
>> 
>> --- 
>> 
>> I have seen a few related posts on what I have found, the #hashCode of my
>> ComplexClass is taking nearly all of the time.
>> Upon further investigation, I found that this is happening because every
>> insert to the session is causing the `result` portion of the accumulate to
>> "recalculate".  During this step, the AccumulateContext `result` RightTuple
>> is having its FactHandle reset to the newly calculated result.
>> This calls the #hashCode of the Collection that is holding all of the
>> current ComplexClass object instances; and Collection calls the #hashCode of
>> each of these (in j.u.Collection impl's such as j.u.AbstractList and
>> j.u.AbstractSet).
>> 
>> So, I have a Collection, that is increasingly growing with ComplexClass
>> object instances, and each time it grows by one, the #hashCode of the entire
>> Collection of ComplexClass objects is being calculated.
>> 
>> The ComplexClass #hashCode is an aggregate of a recursive walk along across
>> all of the objects' #hashCode it reaches through its fields, just like many
>> aggregate types.  I think I can see that this could be expensive if this is
>> being calculated for nearly 5K objects as each of the final objects are
>> inserted causing the `result` recalculation.
>> 
>> ---
>> 
>> I do realize that one potential workaround would be to put a blocking
>> constraint above the accumulate:
>> ```
>> rule "collecting"
>> 	when
>> 	BlockingFactClass()
>> 	$res : Collection() from
>> 			accumulate( $s : ComplexClass(),
>> 				init( Collection c = new ArrayList(); ),
>> 				action( c.add($s); ),
>> 				result( c ))
>> 				
>> 	then
>> 	System.out.println("Done");
>> end
>> ```
>> where the BlockingFactClass is not inserted until *after* all of the
>> ComplexClass objects.  This speeds up the performance significantly; the
>> time is nearly the same as the SimpleClass run actually.
>> 
>> ---
>> 
>> I found that this was an interesting discovery and I did not expect this
>> behavior.  
>> 
>> So @Mark it does seem (to me) that a deeply nested ComplexClass can hurt
>> performance on an AccumulateNode when the `result` can be repeatedly
>> calculated; even when the `result` is not "doing" anything besides returning
>> what has been accumulated/collected.  I understand this is probably just a
>> "gotcha" that I have to deal with.  This behavior is also the same for the
>> Drools `collect` functionality, which I think just uses accumulate in the
>> impl anyways (perhaps I'm incorrect).  Also, I note that this isn't
>> necessary a direct "Object size affecting session insertion performance", as
>> I originally titled this thread.
>> 
>> I also think that the new Phreak-based impl for Drools in v6.x may not
>> behave like this anymore, since it is more lazy and delays work more until
>> firing rules (an assumption here; haven't tested that).
>> 
>> With that said, I'm open to anymore suggestions about how to avoid this
>> issue in pre-Phreak Drools (v6.x)
>> (I am not sure how long until I am able to make that jump in version.).  
>> 
>> Also, I'm open to be corrected if my findings are incorrect/incomplete. :)  
>> 
>> Thanks again for the feedback!  It is helpful.
>> 
>> 
>> 
>> --
>> View this message in context: http://drools.46999.n3.nabble.com/Object-size-impact-on-session-insertion-performance-tp4028244p4028251.html
>> Sent from the Drools: User forum mailing list archive at Nabble.com.
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
> 




More information about the rules-users mailing list