[rules-dev] BUG: [5.3.0.Final] CollectSetAccumulateFunction should probably use IdentityHashMap internally

Wolfgang Laun wolfgang.laun at gmail.com
Wed Feb 29 03:41:40 EST 2012


Here are some first results from comparing collectSet (as in 5.3.0) to
a similar accumulate function using an IdentitiyHashMap<Object,Void>
in its context and returns a true Set as a of the key set as its
result. The reported times are the elapsed times of firing a single
rule doing an accumulate into a Set from all String objects in WM.

(1) An extreme case: insert many different String facts and collect
them into a single set. Strings were composed by concatenating "s" and
i = 0(1)N-1

// different Strings
// collectSet
//  N=   10000   2.542s
//  N=   20000  11.118s
//  N= 100000 198.839s
// ----------------
// collectIdentity
//  N=  10000   3.944s
//  N=  20000  18.514s
//  N= 100000 832.093s
// ===============

(2) Another extreme case: insert many equal String facts as distinct
objects and collect them into a single set. String value "s12345".

// equal Strings
// collectSet
//  N=  10000   0.419s
//  N=  20000   0.558s
//  N= 100000   0.816s
// ---------------
// collectIdentity
//  N=  10000   2.940s
//  N=  20000  13.048s
//  N= 100000 522.286s

(It's faster due to not having to construct the string.)

It's obvious that the continuing growth of a HashMap with its calls to
resize for increasing capacities of 16, 32, 64,... and the rehashing
of all entries is costly.

If your final Set is just a small portion of a (large number) of
collected elements, maintaining the IdentityHashMap is costly whereas
the true set of objects remains at minimum size throughout.

-W



On 28 February 2012 20:39, Wolfgang Laun <wolfgang.laun at gmail.com> wrote:
>
> I have cloned it and running benchmarks on it. Might not be able to complete it tonight.
> -W
>
>
> On 28 February 2012 19:55, Edson Tirelli <ed.tirelli at gmail.com> wrote:
>>
>>
>>    I am sure he would not deny that... :) you are helping to make the world a better place... :)
>>
>>    In any case, please open a JIRA so that this does not get lost in the e-mails. It is a low hanging fruit, very isolated and simple to fix. So anyone looking for a starting work to contribute to the project can do it. Otherwise I will do it when I get the time.
>>
>>    Thanks,
>>        Edson
>>
>> On Tue, Feb 28, 2012 at 1:45 PM, SirMungus <Patrick_Rusk at ssga.com> wrote:
>>>
>>>
>>> Edson Tirelli-4 wrote
>>> >
>>> > I see what you are saying and I think it works. Although, it will be
>>> > heavier, it would cover scenarios not covered by the current
>>> > implementation.
>>> >
>>> >    My suggestion is if you can, please create a test case for the problem,
>>> > change the collectSet accumulate function to do what you propose and send
>>> > a
>>> > pull request. I will review it and apply. I think in this case, making
>>> > user's life easier by not requiring immutability might trump performance,
>>> > but we need to evaluate some tests just in case.
>>> >
>>> Edson, thanks for the offer.  I'll see what I can do.  Since I work at a
>>> financial services company, I practically have to get Obama's permission to
>>> do anything official on an open source project. :)
>>>
>>> --
>>> View this message in context: http://drools.46999.n3.nabble.com/BUG-5-3-0-Final-CollectSetAccumulateFunction-should-probably-use-IdentityHashMap-internally-tp3774079p3784963.html
>>> Sent from the Drools: Developer (committer) mailing list mailing list archive at Nabble.com.
>>> _______________________________________________
>>> rules-dev mailing list
>>> rules-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>>
>>
>>
>> --
>>   Edson Tirelli
>>   JBoss Drools Core Development
>>   JBoss by Red Hat @ www.jboss.com
>>
>> _______________________________________________
>> rules-dev mailing list
>> rules-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>



More information about the rules-dev mailing list