[rules-dev] BUG: [5.3.0.Final] CollectSetAccumulateFunction should probably use IdentityHashMap internally

Mark Proctor mproctor at codehaus.org
Tue Feb 28 02:39:25 EST 2012


Actually this can be done without two collections.

Just use a HashMap and mainta a List of the "clashed" identities. To 
operate on the actualy identity, such as remove. Lookup the List, via 
normal map.get( key ) then iterate the list to remove the identity. If 
the list is empty, remove the key. reverse removes the entity, makes it 
change, and then adds it back in. Adding is the process of looking for 
an existing key and adding to it's clash list, or creating a new key 
with an entry of one.

This would provide identity safe update/removal/reverse will providing a 
set view of the world. I suspect overhead would be < 15% if done right.

It should be done as a custom accumulate function for now, and we should 
profile the performance characteristics of the two - as there are some 
optimizations that can be made, such as only lazy storying the value as 
a list only when there is more than one entry. If you are willing to 
code this up, we can give you pointers on irc:
http://www.jboss.org/drools/irc
http://www.athico.com/Getting_Involved/gettinginvolved.html

Mark
On 28/02/2012 07:25, Mark Proctor wrote:
>  The getResults()
> would probably have to change from...
>
>     public Object getResult(Serializable context) throws Exception {
>         CollectListData data = (CollectListData) context;
>         return Collections.unmodifiableSet( data.map.keySet() );
>     }
>
> ...to...
>
>     public Object getResult(Serializable context) throws Exception {
>         CollectListData data = (CollectListData) context;
>         return Collections.unmodifiableSet( new 
> HashSet(data.map.keySet())
> );
>     }
>
> The proposed getResult() results in a full HashSet copy for each 
> change of Rete, that performance hit would be unbearable for most people.
>
> I do agree that there is some merit to what you are asking, but we 
> can't give that perf hit to everyone. It would be better to maintain 
> two collections, one identity and one set based. And some how only 
> change in set what was relevant to the change in identity, you might 
> need to build a custom hash table for this - see AbstractHashTable in 
> drools which can easily be extended for custom keys and custom 
> add/remove methods.
>
> Mark
>
>
> On 27/02/2012 16:53, SirMungus wrote:
>> Mark Proctor wrote
>>> accumulate supports custom functions, just add your own version of
>>> collectSet that uses IdentityMap.
>>>
>> I could do that, but please let me take one more stab, just in case I
>> haven't been clear enough.
>>
>> 1) I am not suggesting that collectSet() return an identity-based 
>> set. I am
>> content to have it return a normal set, though I had early confusion 
>> on that
>> point.
>>
>> 2) The discussion above about stated vs. justified facts turns out to 
>> be a
>> distraction to the main point of the bug that I believe to be in
>> CollectSetAccumulateFunction.  My failure to understand the 
>> distinction is
>> not germane to the bug.
>>
>> 3) I am not disagreeing with the value behind implementing equals() 
>> to only
>> take into account invariant fields, though I am stating that goes 
>> beyond the
>> explicity documentation of equals()/hashCode() and is explicitly *not*
>> followed by many classes in the JRE (most notably, collections).
>>
>> There is nothing about the purpose of collectSet() that logically 
>> suggests
>> that it should only work with objects that have invariant
>> equals()/hashCode() functions. If five objects match a rule, they 
>> should be
>> able to be collected into a set (which may end up with fewer than five
>> objects). If things change such that only three facts match the rule, 
>> they
>> should still be able to be collected into a set.  There's no compelling
>> reason why that should only apply to objects with invariant equals(). 
>> And it
>> seems to me that you rarely want to leave an NPE in code, rather than
>> catching it and throwing a more intelligible exception (e.g., "Object 
>> to be
>> removed from collectSet() was not found").
>>
>> If the internal maintenance of state in CollectSetAccumulateFunction 
>> used
>> identity semantics and applied the equality semantics only when you 
>> asked it
>> for its results, it would maintain the same external behavior 
>> (returning a
>> "normal" equality-based set) without failing to work for objects with
>> variant equals.
>>
>> If you Google around for the proper implementation of equals(), you will
>> find many serious sites that discuss it at length, yet provide examples
>> like:
>>
>> public boolean equals(Object aThat){
>>    if ( this == aThat ) return true;
>>    if ( !(aThat instanceof Car) ) return false;
>>    Car that = (Car)aThat;
>>    return
>>      EqualsUtil.areEqual(this.fName, that.fName)&&
>>      EqualsUtil.areEqual(this.fNumDoors, that.fNumDoors)&&
>>      EqualsUtil.areEqual(this.fGasMileage, that.fGasMileage)&&
>>      EqualsUtil.areEqual(this.fColor, that.fColor)&&
>>      Arrays.equals(this.fMaintenanceChecks, that.fMaintenanceChecks);
>> //array!
>> }
>>
>> Ironically, that is explicitly against Wolfgang's suggestion above!
>>
>>
>> -- 
>> View this message in context: 
>> http://drools.46999.n3.nabble.com/BUG-5-3-0-Final-CollectSetAccumulateFunction-should-probably-use-IdentityHashMap-internally-tp3774079p3781313.html
>> Sent from the Drools: Developer (committer) mailing list mailing list 
>> archive at Nabble.com.
>> _______________________________________________
>> rules-dev mailing list
>> rules-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-dev
>



More information about the rules-dev mailing list