Re: [rules-users] fact granularity, performance, and other questions

Wednesday, 27 May 2009

David,
...
- Does the order of the conditions in a rule affect performance, the
execution order, or the structure of the Rete network? Yes, quite a lot has been
written about this. The drools documentation  
has a few paragraphs with good heuristics...
  A popular read seems to be this paper, which provides good insight  
into RETE performance aspects:

"Production Matching for Large Learning Systems (Rete/UL)"
by Robert B. Doorenbos
PhD thesis, Carnegie Mellon University, January 31, 1995

...
- Does the order the facts are inserted into a stateless session (as
 a list via the CommandFactory.newInsertElements) affect performance   at
...
all? All I can say that this is true (it has impact) for
statefull  
sessions. The deal here is how often the agenda may get changed while  
inserting objects.
I have no experience with stateless sessions and I hope others do;-)

...
- Should each field in a message be a fact? (more info on my message
below)  What fact granularity have you settled on in your usage and
why? I would suggest to start with a model which is most natural to your  
rules (with the lowest WTF factor) and take it from there.

I assume that you need to correlate information from all segments of a  
message?
I would stay well clear from your option 3 (eval()) - just because I  
always do ;-)
Eval() simply defies RETE optimizations so it can be very sensitive to  
both number of facts and number of rules (using eval).

If you know the field IDs (which I assume you would, using an EDI  
formant) -  I would see whether you can avoid the field facts and turn  
them into slots (attributes) instead.
But not to the extend you outline in #2, that seems to be too awkward.  
If you have buy another server ;-)

...
what did you determine the best approach to be? I am pretty sure
that there is no generic "best approach".
I would proceed trying to understand the nature of the rules (are they  
regular, ie following the same scheme, can they be generated?) and the  
data and create
regression performance test-suite ASAP to monitor performance during  
development.

In our application we for example have analyzed bottlenecks and found  
that (no surprise) accessing properties like facta.factb.value is  
slower compared to facta.facbtValue  - that's in 4.07 - due to the  
different mechanism the value is obtained. From there we have added  
about three of these "shortcut accessors". However we have first  
focused on optimizing the rules themself to avoid excessive matching  
in the first place so there was no need for us to flatten our entire  
model.

my 0.02EUR
  -- Ingomar

Am 28.05.2009 um 02:17 schrieb David Zeigler:

...
 Hi,
 I could use some experienced guidance.  I'm in the process of
 evaluating Drools for use of using in a real-time transactional
 environment to process about 3000 messages/second.  I realize a lot of
 this depends on the type and quantity of rules, hardware, etc.  I'm
 curious what steps others have taken to improve performance and if
 there are any recommendations for my case detailed below.

 A few specific questions I have are:
 - Should each field in a message be a fact? (more info on my message
 below)  What fact granularity have you settled on in your usage and
 why?
 - Does the order of the conditions in a rule affect performance, the
 execution order, or the structure of the Rete network?
 - Does the order the facts are inserted into a stateless session (as
 a list via the CommandFactory.newInsertElements) affect performance at
 all?

 The message is an EDI format and will typically have anywhere from 80
 to 200 fields, potentially more.  The message is divided into
 transactions, then segments, then fields.  We have an object model
 that represents the message.  We're using a stateless session and most
 of the rules will modify fields or add fields and segments based on
 the values of other fields.  Currently, I flatten the object model
 into a List containing the message, transactions, segments, and each
 field, and then insert the list into the stateless session and fire.
 I'm avoiding sequential mode for now until we have a better idea of
 our requirements.

 Here's a simplified example of what I'm doing now (using json-esque
 syntax instead of the EDI format).
 message {
  segment {
    SG:header
    A0:agents
  }
  segment {
    SG:agent
    A1:000
    A2:JAMES
    A3:BOND
  }
  segment {
    SG:agent
    A1:86
    A2:MAXWELL
    A3:SMART
  }
 }

 Each field has an id and a value.  For this message, I would insert 14
 facts:  the message object, 3 segments objects, and 10 field objects.

 rule "set James Bond's A1 to 007"
 when
  $a1 : Field(id == "A1",  value != "007")
  Field(id == "A2",  value == "JAMES")
  Field(id == "A3",  value == "BOND")
 then
  $a1.setValue("007");
  update($a1);
 end

 A relatively more complicated, but typical rule would be "set James
 Bond's A1 to 007 iff he's the second agent in the message and one of
 the other agents' first names does not contain AUSTIN and an agency
 segment exists"

 The above example assumes each segment and field is a fact and I think
 it's a clean and flexible approach, but I'm concerned about the
 overhead of inserting potentially 250 facts for each message.  The
 only other alternatives I can think of seem to have their own set of
 problems:
 1. limit the fields that rules can be written against to a limited
 subset, which may not be feasible depending on how the requirements
 evolve, and only assert those as facts.  Doing this seems to double
 the number of transactions per second in my nonscientific benchmark.
 2. insert the Message object as a single fact and then write a slew of
 accessor methods in that object to get at all possible fields in the
 tree: getA1FromSecondAgentSegmentInFirstTransaction().  This seems
 like it might perform well, but could be very messy.
 3. provide an api in the message object model to find various
 occurrences of fields in the messages, then use eval() in the rule.
 like eval(msg.findSegment("agent",
secondOccurrence).getField("A1")).
 I've read that would be less efficient once the ruleset grows.

 I'm sure many of you have dealt with this type scenario before, what
 did you determine the best approach to be?

 Thanks,
 David
 _______________________________________________
 rules-users mailing list
 rules-users(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/rules-users 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [rules-users] fact granularity, performance, and other questions