[rules-users] advice is needed: rule based processing of inte r connected facts

Edson Tirelli tirelli at post.com
Fri Feb 2 19:59:12 EST 2007


 
    Vlad,

    It is common to compare SQL Engines and Rules Engines. In a certain 
way they do the same thing: select elements matching patterns. But the 
algorithms and technologies used are different. In terms of performance, 
forward chaining Rules Engines (drools for instance) take a huge benefit 
from 2 restrictions that databases don't have.

1. Rule Engines knows all rules (queries) in advance.
   You probably is aware of the performance difference when you use a 
sequence of adhoc queries to a database when compared to create a 
prepared statement and using it to do the same sequence of queries 
right? Why does it happen? Because the prepared statement is compiled 
(and optimized) in advance! it means it can be so much faster than 
compiling each query again and again (or interpreting it) each time an 
adhoc query arrives. What happens with rule engines is that they not 
only know the rules in advance, but they know ALL rules in advance. So, 
when you load (compile) a rulebase with your 400 rules, the engine 
optimizes the execution of the whole 400 at once.
   It means that if for instance 200 of your rules use Account( number < 
9000 ), it will execute the query for these accounts only once and use 
the result for the 200 rules (sharing the constraint between your 
rules). In the case of a database, you will have each query optimized, 
but you will not have all 400 optimized together, means you risk 
executing the same constraint 200 times. (we know that cache, indexing 
and stuff like that helps, but it is not the same).

2. Forward Chaining Rule Engines usually have all data in memory
   There are ways to pull data as needed, but the general use case is 
that you already have all your objects for whom you want to apply rules 
loaded into memory. It means a much much faster execution (and usually a 
much smaller dataset) than what you have in a database. Rule Engines use 
this restriction to implement optimizations that a database can't do, 
since it usually manages data that does not fit into memory at once.
   In this regard, what you usually see is databases and rule engines 
working together. Databases take care of data while rule engines take 
care of the pattern matching and rules execution.

   Having said that, I have no doubt that your 400 rules will run orders 
of magnitude faster than running 400 queries at the database. But 
remember that each technology will excel in its own field. So, keep the 
rules at the engine and the data at the database. :)

   Regarding books, there is a good list at the JBoss Rules page: 
http://labs.jboss.com/portal/jbossrules/docs

   Regarding rules optimization, the same way that happens with SQL, 
there are some general advices that will always apply, but the last mile 
is always dependent on the product you are using. JBoss Rules 
documentation has some tips and people is always helpful here in the 
list. But rest assured that if you make a poc comparing manually 
implemented rules (being SQL, XPath or any other similar technology) 
with a full rules engine implementation... well, there is no doubt what 
you will chose for perf.

   []s
   Edson


Olenin, Vladimir (MOH) wrote:

>Hi, Edson.
>
>Thanks A LOT for the explanations - that significantly cleared things up.
>
>So, would it be a correct thing to say that DROOLS is a complete equevalent
>of SQL? How more/less optimized is it in comparison with relational DB
>implementation (eg, in the example of embedded select statement you gave)? I
>know it might sound like trying to compare apples & oranges, but providing
>that I'd have to run 
>  - either 400 similar SQL queries with different combination of field
>constraints
>  - or process 400 rules in the rule engine (the same constrains would
>apply)
>
>how that might compare?
>
>Is there any good book on formulating / optimizing rules that you can
>recommend?
>
>Thanks.
>
>Vlad
>
>
>-----Original Message-----
>From: rules-users-bounces at lists.jboss.org
>[mailto:rules-users-bounces at lists.jboss.org] On Behalf Of Edson Tirelli
>Sent: 02 February 2007 17:09
>To: Rules Users List
>Subject: Re: [rules-users] advice is needed: rule based processing ofinter
>connected facts
>
>
>   Vlad,
>
>   That's what the engine does... it's like SQL. Imagine you have an 
>"Account" table that has a "number" field. You could write a SQL like:
>
>select * from account a
>where (number % 10) < 5
>   and number < 9000
>   and 0 = (select count(*) from account b where b.number = 
>(a.number+1000) )
>
>   I'm writing it from my head, so there may be syntax errors... :) but 
>I think you got the idea.
>   You don't say write an algorithm saying:
>
>"for each record in account table..."
>
>   The SQL engine iterates the table for you.
>
>   The same happens with Rule Engines. In the case of JBoss Rules, 
>instead of tables, you have classes (Account). Instead of columns, you 
>have class attributes (number).
>   If you write a rule like this:
>
>rule "missing accounts"
>when
>    $a : Account( $number : number -> ( $number % 10 < 5 ), number < 9000  )
>    not Account( number == ( $number + 1000 ) )
>then
>   // $a does not have a matching primary account
>end
>
>   You are saying the engine to iterate over all Account instances, and 
>for each of them bind the variable and apply constraints, and when a 
>full match is found, the consequence is executed.
>
>   []s
>   Edson
>
> 
>
>Olenin, Vladimir (MOH) wrote:
>
>  
>
>>Hi, Edson,
>>
>>I was going through your rule sniplets and I couldn't understand very well
>>one thing:
>>
>>-------------
>>when
>>   $a : Account( $number : number -> ( number % 10 < 5 ), number < 9000  )
>>   not Account( number == ( $number + 1000 ) ) then
>>-------------
>>
>>The 'number' variable refers to the 'fact' in the working memory, correct?
>>Basically it means I have only one particular number to compare ALL
>>    
>>
>accounts
>  
>
>>(from the data sheet) with?
>>
>>If so, it's not what I actually need to achieve. I need to compare all
>>accounts with 'each other', all of them coming from the same data sheet.
>>    
>>
>So,
>  
>
>>I guess it has to be an iteration through all the facts, comparing each
>>    
>>
>fact
>  
>
>>with every other one.
>>
>>Or is the sniplet above does exactly that? Ie, iterating through all the
>>facts?
>>
>>In other words, I'd be initializing working memory ONLY with the facts
>>below:
>>
>> for (Iterator it = accountsFromDataSheet.iterator(); it.hasNext(); ) {
>>   Account account = (Account)it.next();
>>   workingMemory.assertObject(account);
>> }
>>
>>After which the rules must operate on the facts loaded...
>>
>>Thanks.
>>
>>Vlad
>>
>>
>>-----Original Message-----
>>From: rules-users-bounces at lists.jboss.org
>>[mailto:rules-users-bounces at lists.jboss.org] On Behalf Of Edson Tirelli
>>Sent: 02 February 2007 11:13
>>To: Rules Users List
>>Subject: Re: [rules-users] advice is needed: rule based processing ofinter
>>connected facts
>>
>>  Hi Vlad,
>>
>>  This is a case where you can apply business rules with good results.
>>  In the end, it all depends on how you model your Business Objects, 
>>but lets get some examples:
>>
>> 
>>
>>    
>>
>>>1) for all primary accounts 'zxxy' where y < 5, there should be a matching
>>>primary account '(z+1)xxy'
>>>   - [this one is true for the dataset above]
>>>   
>>>
>>>      
>>>
>>   My understanding is that you are validating your accounting plan, so 
>>you may have an Account object in your model. So, if you want to report 
>>inconsistencies, you can do something like:
>>
>>rule "missing accounts"
>>when
>>   $a : Account( $number : number -> ( number % 10 < 5 ), number < 9000  )
>>   not Account( number == ( $number + 1000 ) )
>>then
>>  // $a does not have a matching primary account
>>end
>>
>>  Please, note that the "formulas" I used above may not be the best way 
>>to do it... they are only a possible representation of what you said.
>>
>> 
>>
>>    
>>
>>>2) sumOfDebit(primary + matching_primary + secondary_account) -
>>>sumOfCredit(primary + matching_primary + secondary_account) must be = 0
>>>   - [this one is also true]
>>>   
>>>
>>>      
>>>
>>  Here, it seems you are refering to a set of transactions, so you 
>>might have a set of transaction objects to represent the transaction in 
>>your sample. So, a possible representation would be:
>>
>>rule "transaction consistency"
>>when
>>   Transaction( $id : id )
>>   $credits: Double( )
>>             from accumulate( TransactionEntry( id == $id, operation == 
>>"credit", $amount : amount ),
>>                                         init( double balance = 0 ),
>>                                         action(  balance += $amount ),
>>                                         result( new Double( balance ) )
>>    
>>
>);
>  
>
>>   $debits: Double( )
>>             from accumulate( TransactionEntry( id == $id, operation == 
>>"debit", $amount : amount ),
>>                                         init( double balance = 0 ),
>>                                         action(  balance -= $amount ),
>>                                         result( new Double( balance ) )
>>    
>>
>);
>  
>
>>   eval( ! $credits.equals( $debits ) )
>>then
>>  // inconsistency for transaction $id
>>end
>>                                      
>>  Again, this is not the only way or the best way... it is just an
>>    
>>
>example.
>  
>
>>  Also, for the above examples, I used syntax/features of the jbrules 
>>3.1 version.
>>
>>  Hope it helps.
>>
>>  []s
>>  Edson
>> 
>>
>>Olenin, Vladimir (MOH) wrote:
>>
>> 
>>
>>    
>>
>>>Ok, approx data set:
>>>
>>>Primary Account | Secondary Account | Operation | Amount | Type | Owner
>>>------------------------------------------------------------------------
>>>0001            |                   | debit     | 100    | A    | M
>>>1001            |                   | credit    | 80     | A    | F
>>>1001            |                   | credit    | 20     | X    | F
>>>0002            | 2002              | debit     | 50     | B    | M
>>>2002            |                   | dedit     | 20     | B    | M
>>>1002            |                   | credit    | 70     | C    | M
>>>
>>>Rules:
>>>
>>>1) for all primary accounts 'zxxy' where y < 5, there should be a matching
>>>primary account '(z+1)xxy'
>>>   - [this one is true for the dataset above]
>>>2) sumOfDebit(primary + matching_primary + secondary_account) -
>>>sumOfCredit(primary + matching_primary + secondary_account) must be = 0
>>>   - [this one is also true]
>>>3) OwnerOf (primary_account, matching_primary, secondary_account) must be
>>>   
>>>
>>>      
>>>
>>of
>> 
>>
>>    
>>
>>>the same gender
>>>   - [this one is false - 0001 owner must be 'F']
>>>
>>>.... The kind of the rules above... The dataset is more complex and the
>>>rules are a bit more involved, but this should give an idea.
>>>
>>>Thanks for all suggestions!
>>>
>>>Vlad
>>>
>>>-----Original Message-----
>>>From: rules-users-bounces at lists.jboss.org
>>>[mailto:rules-users-bounces at lists.jboss.org] On Behalf Of Michael Rhoden
>>>Sent: 01 February 2007 17:49
>>>To: 'Rules Users List'
>>>Subject: RE: [rules-users] advice is needed: rule based processing
>>>ofinterconnected facts
>>>
>>>Can you post a couple of example conditions with a dataset you want to
>>>check?
>>>
>>>-Michael
>>>
>>>-----Original Message-----
>>>From: rules-users-bounces at lists.jboss.org
>>>[mailto:rules-users-bounces at lists.jboss.org] On Behalf Of Olenin, Vladimir
>>>(MOH)
>>>Sent: Thursday, February 01, 2007 4:04 PM
>>>To: rules-users at lists.jboss.org
>>>Subject: [rules-users] advice is needed: rule based processing
>>>ofinterconnected facts
>>>
>>>Hi,
>>>
>>>
>>>
>>>I need some pointer as to where to start with the problem below.
>>>
>>>
>>>
>>>The system should be able to validate the balancing data based on around
>>>   
>>>
>>>      
>>>
>>400
>> 
>>
>>    
>>
>>>different rules. To simplify the task, the data facts are essentially the
>>>debit/credit transactions on different accounts. The rules describe the
>>>correlation between different facts:
>>>
>>>-          eg, all debit transactions minus all credit transaction must be
>>>equal 0
>>>
>>>-          if one account got credited, there should be another account
>>>(within the same dataset) which was debited
>>>
>>>-          if there are accounts starting with some letter combination,
>>>there also should be
>>>
>>>-          etc
>>>
>>>
>>>
>>>In other words, each rule describes
>>>
>>>-          the subset of facts to be analyzed
>>>
>>>-          the rules to be checked against this subset
>>>
>>>
>>>
>>>It seems basically like each fact is a set of Account Number, Transaction
>>>Type, Transaction Amount information, Secondary Account Number (which
>>>sometimes needs to be validated against some other account number within
>>>   
>>>
>>>      
>>>
>>the
>> 
>>
>>    
>>
>>>same data set). But I couldn't find a way to relate between multiple data
>>>facts.
>>>
>>>
>>>
>>>On one hand rule engine seems to be a good solution in here, but I'm not
>>>sure how to deal with 'dynamic selection' of the subset of facts. Can this
>>>kind of task be reformulated somehow?
>>>
>>>
>>>
>>>Any pointers into the DROOLS documentation or hints on a general approach
>>>would be greatly appreciated!
>>>
>>>
>>>
>>>Thanks.
>>>
>>>
>>>
>>>Vlad
>>>
>>>_______________________________________________
>>>rules-users mailing list
>>>rules-users at lists.jboss.org
>>>https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>>
>>>_______________________________________________
>>>rules-users mailing list
>>>rules-users at lists.jboss.org
>>>https://lists.jboss.org/mailman/listinfo/rules-users
>>>_______________________________________________
>>>rules-users mailing list
>>>rules-users at lists.jboss.org
>>>https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>> 
>>
>>    
>>
>
>
>  
>


-- 
 Edson Tirelli
 Software Engineer - JBoss Rules Core Developer
 Office: +55 11 3124-6000
 Mobile: +55 11 9218-4151
 JBoss, a division of Red Hat @ www.jboss.com





More information about the rules-users mailing list