[rules-users] Rule set execution performance and memory consumption issues

Sun Aug 4 09:10:02 EDT 2013

A simple test shows that, with the rules as shown, you can insert 100k
Assignments and an equal number of SourceAssignments (with ~300
InstrumentList facts) in a matter of seconds, irrespective of the
number of hits and  misses, either way. I suspect that there is some
other fly in the ointment.

There is no "Instrument" pattern in any of the rules, which makes your
post slightly inconsistent in itself.

-W

On 04/08/2013, Joe Ammann <joe at pyx.ch> wrote:
> Hi all (sorry for the lenghty post, I'm trying to describe a basic pattern)
>
> I have a set of similar rules which work ok, but due to increasing
> number of facts are starting to create performance headaches. They all
> share a basic pattern, and I wanted to ask if you think that my approach
> is feasible - or if I should choose a totally different approach.
>
> The basic pattern is that I have a set of facts of one type, where every
> fact "owns" a collection of facts of another type. The association is
> not a simple one, rather it has some attributes on its own. The ruleset
> is triggered when a new file delivery from a financial data provider
> arrives, and we want to check the quality of the new data.
>
> Examples:
>
> FinancialInstrument  -> Recommendation/Rating (where the recommendation
> has a date, a rating code, etc.)
> InstrumentList -> FinancialInstrumentAssignment (where the assignment
> has a priority, a validFrom date, etc.)
>
> So we have existing assignments in our database (say 100
> InstrumentLists, 20'000 Instruments, 200'000 individual list
> assignments), and a new file arrives. The data in this file is converted
> and loaded into staging tables, and ends up in special Java objects we
> call "SourcingXXX". So a "Instrument" always has a corresponding
> "SourcingInstrument" class.
>
> The rules must validate the incoming data and modify (insert, update,
> delete) the existing objects. My current basic rule pattern is that for
> every such "association", I have 3 or more rules like:
>
>     # the mappings that already exist
>     rule "R1110: MatchExistingAssignment"
>     when
>         sa : SourcingAssignment( )
>         a : Assignment( instrumentId == sa.instrumentId,
>     instrumentListId == sa.instrumentListId )
>     then
>         # apply the logic
>         retract(a);
>         retract(sa);
>     end
>
>     # new mappings must be created
>     rule "R1111: MatchNewAssignment"
>     when
>         sa : SourcingAssignment( )
>         not a : Assignment( instrumentId == sa.instrumentId,
>     instrumentListId == sa.instrumentListId )
>         il : InstrumentList( id == sa.instrumentListId )
>     then
>         # create new assignment
>         Assignment a = ...
>         retract(sa);
>     end
>
>     # missing mappings must be removed
>     # execute late when the above have had a chance
>     rule "R1112: RemoveAssignment"
>     salience -100
>     when
>         a : Assignment( )
>     then
>         # remove the assignment
>         retract(a);
>     end
>
> I shortened these rules to emphasize the basic pattern (hopefully no
> typos made..). Normally the rules have more conditions than just the
> mapping conditions shown here, and typically there is more than one rule
> for the "new case" and the "already exists case".
>
> Typical fact numbers are
> - a few 100-1000 of the "owning" object (InstrumentList above)
> - a few 10'000 of the "owned" object (Instrument above)
> - a few 100'000 of the "assignment" object (Assignment above)
>
> I'm starting to see problematic runtimes (~20 minutes) and memory
> consumption (1-2 GB), of course mainly with the "assignment object"
> rules, where 100'000 Sourcing objects are matched against 100'000 or so
> existing objects. Most of that is of course used during the insertion of
> the facts, the fireAllRules() is fast (1-2 minutes).
>
> Is my approach totally flawed? Is there a fundamentally different
> approach? I see that some simple cases could be done without Drools in
> SQL or Java, but IMHO the more complex cases result in unmanageable
> Java/SQL code (we already have one example of a 200 line SQL statement
> that nobody dares to touch...)
>
> --
> CU, Joe
>
>