[rules-dev] BUG: [5.3.0.Final] CollectSetAccumulateFunction should probably use IdentityHashMap internally
Edson Tirelli
ed.tirelli at gmail.com
Tue Feb 28 10:22:38 EST 2012
Hi all,
Sorry for being late to reply to this thread. A few comments:
a) Please note that clashes objects was taken into account in the design of
the solution, but there is a bug in there.
b) If you look at [1] you will see that the value that is being collected
is already cached by the engine by fact handle ID and used on reverse. So
that takes care of expressions, for instance, where a new object is created
every time. For instance:
collectSet( $firstName + $lastName )
In that case, every time the expression is evaluated, it will create a new
immutable String object that we tract by fact ID, cache it, and properly
manage with equality semantics.
c) The bug seems to happen when the expression in the function returns a
mutable object, as observed by SirMungus. E.g.:
collectSet( $person )
In this case, even if the original object is cached and re-used in [1],
because it changed hashCodes(), it is not found in [2], as reported.
d) The problem is that we are collecting *sets* of objects, and because of
that, we have to use equality semantics to check for duplicates. I was
using a java Map to track that, and that works find for the use case (b)
above, but obviously breaks on use case (c). Moving to identity does not
seem correct to me as it would work for use case (c), but no longer work
for (b).
I don't have a solution yet, as I am involved in another large complex
task, but I will re-evaluate this as soon as possible. Meanwhile, if
someone has a suggestion that supports both cases above, please let us know.
Thank you,
Edson
[1]
https://github.com/droolsjbpm/drools/blob/master/drools-core/src/main/java/org/drools/base/accumulators/JavaAccumulatorFunctionExecutor.java#L129
[2]
https://github.com/droolsjbpm/drools/blob/master/drools-core/src/main/java/org/drools/base/accumulators/CollectSetAccumulateFunction.java#L119
On Tue, Feb 28, 2012 at 5:28 AM, Wolfgang Laun <wolfgang.laun at gmail.com>wrote:
> How do you find the List for removal by identity when the key has
> already been changed?
> -W
>
> On 28/02/2012, Mark Proctor <mproctor at codehaus.org> wrote:
> > Actually this can be done without two collections.
> >
> > Just use a HashMap and mainta a List of the "clashed" identities. To
> > operate on the actualy identity, such as remove. Lookup the List, via
> > normal map.get( key ) then iterate the list to remove the identity. If
> > the list is empty, remove the key. reverse removes the entity, makes it
> > change, and then adds it back in. Adding is the process of looking for
> > an existing key and adding to it's clash list, or creating a new key
> > with an entry of one.
> >
> > This would provide identity safe update/removal/reverse will providing a
> > set view of the world. I suspect overhead would be < 15% if done right.
> >
> > It should be done as a custom accumulate function for now, and we should
> > profile the performance characteristics of the two - as there are some
> > optimizations that can be made, such as only lazy storying the value as
> > a list only when there is more than one entry. If you are willing to
> > code this up, we can give you pointers on irc:
> > http://www.jboss.org/drools/irc
> > http://www.athico.com/Getting_Involved/gettinginvolved.html
> >
> > Mark
> > On 28/02/2012 07:25, Mark Proctor wrote:
> >> The getResults()
> >> would probably have to change from...
> >>
> >> public Object getResult(Serializable context) throws Exception {
> >> CollectListData data = (CollectListData) context;
> >> return Collections.unmodifiableSet( data.map.keySet() );
> >> }
> >>
> >> ...to...
> >>
> >> public Object getResult(Serializable context) throws Exception {
> >> CollectListData data = (CollectListData) context;
> >> return Collections.unmodifiableSet( new
> >> HashSet(data.map.keySet())
> >> );
> >> }
> >>
> >> The proposed getResult() results in a full HashSet copy for each
> >> change of Rete, that performance hit would be unbearable for most
> people.
> >>
> >> I do agree that there is some merit to what you are asking, but we
> >> can't give that perf hit to everyone. It would be better to maintain
> >> two collections, one identity and one set based. And some how only
> >> change in set what was relevant to the change in identity, you might
> >> need to build a custom hash table for this - see AbstractHashTable in
> >> drools which can easily be extended for custom keys and custom
> >> add/remove methods.
> >>
> >> Mark
> >>
> >>
> >> On 27/02/2012 16:53, SirMungus wrote:
> >>> Mark Proctor wrote
> >>>> accumulate supports custom functions, just add your own version of
> >>>> collectSet that uses IdentityMap.
> >>>>
> >>> I could do that, but please let me take one more stab, just in case I
> >>> haven't been clear enough.
> >>>
> >>> 1) I am not suggesting that collectSet() return an identity-based
> >>> set. I am
> >>> content to have it return a normal set, though I had early confusion
> >>> on that
> >>> point.
> >>>
> >>> 2) The discussion above about stated vs. justified facts turns out to
> >>> be a
> >>> distraction to the main point of the bug that I believe to be in
> >>> CollectSetAccumulateFunction. My failure to understand the
> >>> distinction is
> >>> not germane to the bug.
> >>>
> >>> 3) I am not disagreeing with the value behind implementing equals()
> >>> to only
> >>> take into account invariant fields, though I am stating that goes
> >>> beyond the
> >>> explicity documentation of equals()/hashCode() and is explicitly *not*
> >>> followed by many classes in the JRE (most notably, collections).
> >>>
> >>> There is nothing about the purpose of collectSet() that logically
> >>> suggests
> >>> that it should only work with objects that have invariant
> >>> equals()/hashCode() functions. If five objects match a rule, they
> >>> should be
> >>> able to be collected into a set (which may end up with fewer than five
> >>> objects). If things change such that only three facts match the rule,
> >>> they
> >>> should still be able to be collected into a set. There's no compelling
> >>> reason why that should only apply to objects with invariant equals().
> >>> And it
> >>> seems to me that you rarely want to leave an NPE in code, rather than
> >>> catching it and throwing a more intelligible exception (e.g., "Object
> >>> to be
> >>> removed from collectSet() was not found").
> >>>
> >>> If the internal maintenance of state in CollectSetAccumulateFunction
> >>> used
> >>> identity semantics and applied the equality semantics only when you
> >>> asked it
> >>> for its results, it would maintain the same external behavior
> >>> (returning a
> >>> "normal" equality-based set) without failing to work for objects with
> >>> variant equals.
> >>>
> >>> If you Google around for the proper implementation of equals(), you
> will
> >>> find many serious sites that discuss it at length, yet provide examples
> >>> like:
> >>>
> >>> public boolean equals(Object aThat){
> >>> if ( this == aThat ) return true;
> >>> if ( !(aThat instanceof Car) ) return false;
> >>> Car that = (Car)aThat;
> >>> return
> >>> EqualsUtil.areEqual(this.fName, that.fName)&&
> >>> EqualsUtil.areEqual(this.fNumDoors, that.fNumDoors)&&
> >>> EqualsUtil.areEqual(this.fGasMileage, that.fGasMileage)&&
> >>> EqualsUtil.areEqual(this.fColor, that.fColor)&&
> >>> Arrays.equals(this.fMaintenanceChecks, that.fMaintenanceChecks);
> >>> //array!
> >>> }
> >>>
> >>> Ironically, that is explicitly against Wolfgang's suggestion above!
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://drools.46999.n3.nabble.com/BUG-5-3-0-Final-CollectSetAccumulateFunction-should-probably-use-IdentityHashMap-internally-tp3774079p3781313.html
> >>> Sent from the Drools: Developer (committer) mailing list mailing list
> >>> archive at Nabble.com.
> >>> _______________________________________________
> >>> rules-dev mailing list
> >>> rules-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/rules-dev
> >>
> >
> > _______________________________________________
> > rules-dev mailing list
> > rules-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev
>
--
Edson Tirelli
JBoss Drools Core Development
JBoss by Red Hat @ www.jboss.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-dev/attachments/20120228/13c287e7/attachment.html
More information about the rules-dev
mailing list