A simple test shows that, with the rules as shown, you can insert 100k
Assignments and an equal number of SourceAssignments (with ~300
InstrumentList facts) in a matter of seconds, irrespective of the
number of hits and misses, either way. I suspect that there is some
other fly in the ointment.
There is no "Instrument" pattern in any of the rules, which makes your
post slightly inconsistent in itself.
-W
On 04/08/2013, Joe Ammann <joe(a)pyx.ch> wrote:
Hi all (sorry for the lenghty post, I'm trying to describe a
basic pattern)
I have a set of similar rules which work ok, but due to increasing
number of facts are starting to create performance headaches. They all
share a basic pattern, and I wanted to ask if you think that my approach
is feasible - or if I should choose a totally different approach.
The basic pattern is that I have a set of facts of one type, where every
fact "owns" a collection of facts of another type. The association is
not a simple one, rather it has some attributes on its own. The ruleset
is triggered when a new file delivery from a financial data provider
arrives, and we want to check the quality of the new data.
Examples:
FinancialInstrument -> Recommendation/Rating (where the recommendation
has a date, a rating code, etc.)
InstrumentList -> FinancialInstrumentAssignment (where the assignment
has a priority, a validFrom date, etc.)
So we have existing assignments in our database (say 100
InstrumentLists, 20'000 Instruments, 200'000 individual list
assignments), and a new file arrives. The data in this file is converted
and loaded into staging tables, and ends up in special Java objects we
call "SourcingXXX". So a "Instrument" always has a corresponding
"SourcingInstrument" class.
The rules must validate the incoming data and modify (insert, update,
delete) the existing objects. My current basic rule pattern is that for
every such "association", I have 3 or more rules like:
# the mappings that already exist
rule "R1110: MatchExistingAssignment"
when
sa : SourcingAssignment( )
a : Assignment( instrumentId == sa.instrumentId,
instrumentListId == sa.instrumentListId )
then
# apply the logic
retract(a);
retract(sa);
end
# new mappings must be created
rule "R1111: MatchNewAssignment"
when
sa : SourcingAssignment( )
not a : Assignment( instrumentId == sa.instrumentId,
instrumentListId == sa.instrumentListId )
il : InstrumentList( id == sa.instrumentListId )
then
# create new assignment
Assignment a = ...
retract(sa);
end
# missing mappings must be removed
# execute late when the above have had a chance
rule "R1112: RemoveAssignment"
salience -100
when
a : Assignment( )
then
# remove the assignment
retract(a);
end
I shortened these rules to emphasize the basic pattern (hopefully no
typos made..). Normally the rules have more conditions than just the
mapping conditions shown here, and typically there is more than one rule
for the "new case" and the "already exists case".
Typical fact numbers are
- a few 100-1000 of the "owning" object (InstrumentList above)
- a few 10'000 of the "owned" object (Instrument above)
- a few 100'000 of the "assignment" object (Assignment above)
I'm starting to see problematic runtimes (~20 minutes) and memory
consumption (1-2 GB), of course mainly with the "assignment object"
rules, where 100'000 Sourcing objects are matched against 100'000 or so
existing objects. Most of that is of course used during the insertion of
the facts, the fireAllRules() is fast (1-2 minutes).
Is my approach totally flawed? Is there a fundamentally different
approach? I see that some simple cases could be done without Drools in
SQL or Java, but IMHO the more complex cases result in unmanageable
Java/SQL code (we already have one example of a 200 line SQL statement
that nobody dares to touch...)
--
CU, Joe