Hi all (sorry for the lenghty post, I'm trying to describe a basic pattern)

I have a set of similar rules which work ok, but due to increasing number of facts are starting to create performance headaches. They all share a basic pattern, and I wanted to ask if you think that my approach is feasible - or if I should choose a totally different approach.

The basic pattern is that I have a set of facts of one type, where every fact "owns" a collection of facts of another type. The association is not a simple one, rather it has some attributes on its own. The ruleset is triggered when a new file delivery from a financial data provider arrives, and we want to check the quality of the new data.

Examples:

FinancialInstrument -> Recommendation/Rating (where the recommendation has a date, a rating code, etc.)
InstrumentList -> FinancialInstrumentAssignment (where the assignment has a priority, a validFrom date, etc.)

So we have existing assignments in our database (say 100 InstrumentLists, 20'000 Instruments, 200'000 individual list assignments), and a new file arrives. The data in this file is converted and loaded into staging tables, and ends up in special Java objects we call "SourcingXXX". So a "Instrument" always has a corresponding "SourcingInstrument" class.

The rules must validate the incoming data and modify (insert, update, delete) the existing objects. My current basic rule pattern is that for every such "association", I have 3 or more rules like:

# the mappings that already exist
rule "R1110: MatchExistingAssignment"
when
    sa : SourcingAssignment( )
    a : Assignment( instrumentId == sa.instrumentId, instrumentListId == sa.instrumentListId )
then
    # apply the logic
    retract(a);
    retract(sa);
end

# new mappings must be created
rule "R1111: MatchNewAssignment"
when
    sa : SourcingAssignment( )
    not a : Assignment( instrumentId == sa.instrumentId,instrumentListId == sa.instrumentListId )
    il : InstrumentList( id == sa.instrumentListId )
then
    # create new assignment
    Assignment a = ...
    retract(sa);
end

# missing mappings must be removed
# execute late when the above have had a chance
rule "R1112: RemoveAssignment"
salience -100
when
    a : Assignment( )
then
    # remove the assignment
    retract(a);
end

I shortened these rules to emphasize the basic pattern (hopefully no typos made..). Normally the rules have more conditions than just the mapping conditions shown here, and typically there is more than one rule for the "new case" and the "already exists case".

Typical fact numbers are
- a few 100-1000 of the "owning" object (InstrumentList above)
- a few 10'000 of the "owned" object (Instrument above)
- a few 100'000 of the "assignment" object (Assignment above)

I'm starting to see problematic runtimes (~20 minutes) and memory consumption (1-2 GB), of course mainly with the "assignment object" rules, where 100'000 Sourcing objects are matched against 100'000 or so existing objects. Most of that is of course used during the insertion of the facts, the fireAllRules() is fast (1-2 minutes).

Is my approach totally flawed? Is there a fundamentally different approach? I see that some simple cases could be done without Drools in SQL or Java, but IMHO the more complex cases result in unmanageable Java/SQL code (we already have one example of a 200 line SQL statement that nobody dares to touch...)

-- 
CU, Joe