Hi all (sorry for the lenghty post, I'm trying to describe a basic
pattern)
I have a set of similar rules which work ok, but due to increasing
number of facts are starting to create performance headaches. They
all share a basic pattern, and I wanted to ask if you think that my
approach is feasible - or if I should choose a totally different
approach.
The basic pattern is that I have a set of facts of one type, where
every fact "owns" a collection of facts of another type. The
association is not a simple one, rather it has some attributes on
its own. The ruleset is triggered when a new file delivery from a
financial data provider arrives, and we want to check the quality of
the new data.
Examples:
FinancialInstrument -> Recommendation/Rating (where the
recommendation has a date, a rating code, etc.)
InstrumentList -> FinancialInstrumentAssignment (where the
assignment has a priority, a validFrom date, etc.)
So we have existing assignments in our database (say 100
InstrumentLists, 20'000 Instruments, 200'000 individual list
assignments), and a new file arrives. The data in this file is
converted and loaded into staging tables, and ends up in special
Java objects we call "SourcingXXX". So a "Instrument" always has a
corresponding "SourcingInstrument" class.
The rules must validate the incoming data and modify (insert,
update, delete) the existing objects. My current basic rule pattern
is that for every such "association", I have 3 or more rules like:
# the mappings that already exist
rule "R1110: MatchExistingAssignment"
when
sa : SourcingAssignment( )
a : Assignment( instrumentId == sa.instrumentId, instrumentListId
== sa.instrumentListId )
then
# apply the logic
retract(a);
retract(sa);
end
# new mappings must be created
rule "R1111: MatchNewAssignment"
when
sa : SourcingAssignment( )
not a : Assignment( instrumentId ==
sa.instrumentId, instrumentListId ==
sa.instrumentListId )
il : InstrumentList( id == sa.instrumentListId )
then
# create new assignment
Assignment a = ...
retract(sa);
end
# missing mappings must be removed
# execute late when the above have had a chance
rule "R1112: RemoveAssignment"
salience -100
when
a : Assignment( )
then
# remove the assignment
retract(a);
end
I shortened these rules to emphasize the basic pattern (hopefully no
typos made..). Normally the rules have more conditions than just the
mapping conditions shown here, and typically there is more than one
rule for the "new case" and the "already exists case".
Typical fact numbers are
- a few 100-1000 of the "owning" object (InstrumentList above)
- a few 10'000 of the "owned" object (Instrument above)
- a few 100'000 of the "assignment" object (Assignment above)
I'm starting to see problematic runtimes (~20 minutes) and memory
consumption (1-2 GB), of course mainly with the "assignment object"
rules, where 100'000 Sourcing objects are matched against 100'000 or
so existing objects. Most of that is of course used during the
insertion of the facts, the fireAllRules() is fast (1-2 minutes).
Is my approach totally flawed? Is there a fundamentally different
approach? I see that some simple cases could be done without Drools
in SQL or Java, but IMHO the more complex cases result in
unmanageable Java/SQL code (we already have one example of a 200
line SQL statement that nobody dares to touch...)
--
CU, Joe