[rules-users] fact granularity, performance, and other questions

Wed May 27 22:04:21 EDT 2009

To address some points in order:

Yes, having each field in a message as a fact is the most straightforward.  The catch is that the rejoining conditions should be as fast as possible.  I'm thinking this structure would be best:

class Message {
  ...message level properties...
}

class Segment {
  public Message message;
  public int position;
  ...Segment level properties
}

Then the rules would be like this:

when
  m: Message(...foo...);
  s0: Segment(position == 0, message == m, ...bar...)
  s1: Segment(position == 1, message == m, ...bas...)
then
...
end

That's about as fast as you'll get with this setup, I figure.  And you're right, it is important to list the conditions in a certain order: with the "position == n" condition before the "message == m" condition, MANY Segments in working memory are eliminated from consideration before getting to the more expensive cross object join. (Even though the cross object condition is about as cheap as you can get, a straight == check.)

I'm not sure about the answer to the fact insertion order question vis a vis performance in a stateless session.  My guess is no effect, but probably a question for the devs. (Or a bit 'o' experimentation.)

It would be nice if your item #2 could be accomplished with a class property that's a multidimensional String array, though it throws an array out of bounds exception if you try testing for an array position that's too high.  Another way (which I think is what you suggest below) would be to have a one time generated class to with the following structure:

class Fact {
  String[][][] facts;

  public String geta0s0t0() {
    if(facts.length >=1 && facts[0].length >= 1 facts[0][0].length >= 0) {
      return facts[0][0][0];
    } else {
      return "";
    }
  }
}

Initialize each one with a String[][][] and you're on your way.  This would absolutely be faster than the first method above, probably by several orders of magnitude.  

Another approach would be to see how hard it would be to alter mvel so it evaluates the array[out_of_bounds_index] expressions to false instead of throwing an exception.  Maybe for the long term...  

--- On Wed, 5/27/09, David Zeigler <dzeigler at gmail.com> wrote:

> From: David Zeigler <dzeigler at gmail.com>
> Subject: [rules-users] fact granularity, performance, and other questions
> To: "Rules Users List" <rules-users at lists.jboss.org>
> Date: Wednesday, May 27, 2009, 7:17 PM
> Hi,
> I could use some experienced guidance.  I'm in the
> process of
> evaluating Drools for use of using in a real-time
> transactional
> environment to process about 3000 messages/second.  I
> realize a lot of
> this depends on the type and quantity of rules, hardware,
> etc.  I'm
> curious what steps others have taken to improve performance
> and if
> there are any recommendations for my case detailed below.
> 
> A few specific questions I have are:
>  - Should each field in a message be a fact? (more info on
> my message
> below)  What fact granularity have you settled on in
> your usage and
> why?
>  - Does the order of the conditions in a rule affect
> performance, the
> execution order, or the structure of the Rete network?
>  - Does the order the facts are inserted into a stateless
> session (as
> a list via the CommandFactory.newInsertElements) affect
> performance at
> all?
> 
> The message is an EDI format and will typically have
> anywhere from 80
> to 200 fields, potentially more.  The message is
> divided into
> transactions, then segments, then fields.  We have an
> object model
> that represents the message.  We're using a stateless
> session and most
> of the rules will modify fields or add fields and segments
> based on
> the values of other fields.  Currently, I flatten the
> object model
> into a List containing the message, transactions, segments,
> and each
> field, and then insert the list into the stateless session
> and fire.
> I'm avoiding sequential mode for now until we have a better
> idea of
> our requirements.
> 
> Here's a simplified example of what I'm doing now (using
> json-esque
> syntax instead of the EDI format).
> message {
>   segment {
>     SG:header
>     A0:agents
>   }
>   segment {
>     SG:agent
>     A1:000
>     A2:JAMES
>     A3:BOND
>   }
>   segment {
>     SG:agent
>     A1:86
>     A2:MAXWELL
>     A3:SMART
>   }
> }
> 
> Each field has an id and a value.  For this message, I
> would insert 14
> facts:  the message object, 3 segments objects, and 10
> field objects.
> 
> rule "set James Bond's A1 to 007"
> when
>   $a1 : Field(id == "A1",  value != "007")
>   Field(id == "A2",  value == "JAMES")
>   Field(id == "A3",  value == "BOND")
> then
>   $a1.setValue("007");
>   update($a1);
> end
> 
> A relatively more complicated, but typical rule would be
> "set James
> Bond's A1 to 007 iff he's the second agent in the message
> and one of
> the other agents' first names does not contain AUSTIN and
> an agency
> segment exists"
> 
> The above example assumes each segment and field is a fact
> and I think
> it's a clean and flexible approach, but I'm concerned about
> the
> overhead of inserting potentially 250 facts for each
> message.  The
> only other alternatives I can think of seem to have their
> own set of
> problems:
> 1. limit the fields that rules can be written against to a
> limited
> subset, which may not be feasible depending on how the
> requirements
> evolve, and only assert those as facts.  Doing this
> seems to double
> the number of transactions per second in my nonscientific
> benchmark.
> 2. insert the Message object as a single fact and then
> write a slew of
> accessor methods in that object to get at all possible
> fields in the
> tree:
> getA1FromSecondAgentSegmentInFirstTransaction().  This
> seems
> like it might perform well, but could be very messy.
> 3. provide an api in the message object model to find
> various
> occurrences of fields in the messages, then use eval() in
> the rule.
> like eval(msg.findSegment("agent",
> secondOccurrence).getField("A1")).
> I've read that would be less efficient once the ruleset
> grows.
> 
> I'm sure many of you have dealt with this type scenario
> before, what
> did you determine the best approach to be?
> 
> Thanks,
> David
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>