Re: [rules-users] fact granularity, performance, and other questions

Wednesday, 27 May 2009

To address some points in order:

Yes, having each field in a message as a fact is the most straightforward.  The catch is
that the rejoining conditions should be as fast as possible.  I'm thinking this
structure would be best:

class Message {
  ...message level properties...
}

class Segment {
  public Message message;
  public int position;
  ...Segment level properties
}

Then the rules would be like this:

when
  m: Message(...foo...);
  s0: Segment(position == 0, message == m, ...bar...)
  s1: Segment(position == 1, message == m, ...bas...)
then
...
end

That's about as fast as you'll get with this setup, I figure.  And you're
right, it is important to list the conditions in a certain order: with the "position
== n" condition before the "message == m" condition, MANY Segments in
working memory are eliminated from consideration before getting to the more expensive
cross object join. (Even though the cross object condition is about as cheap as you can
get, a straight == check.)

I'm not sure about the answer to the fact insertion order question vis a vis
performance in a stateless session.  My guess is no effect, but probably a question for
the devs. (Or a bit 'o' experimentation.)

It would be nice if your item #2 could be accomplished with a class property that's a
multidimensional String array, though it throws an array out of bounds exception if you
try testing for an array position that's too high.  Another way (which I think is what
you suggest below) would be to have a one time generated class to with the following
structure:

class Fact {
  String[][][] facts;

  public String geta0s0t0() {
    if(facts.length >=1 && facts[0].length >= 1 facts[0][0].length >= 0)
{
      return facts[0][0][0];
    } else {
      return "";
    }
  }
}

Initialize each one with a String[][][] and you're on your way.  This would absolutely
be faster than the first method above, probably by several orders of magnitude.  

Another approach would be to see how hard it would be to alter mvel so it evaluates the
array[out_of_bounds_index] expressions to false instead of throwing an exception.  Maybe
for the long term...  

--- On Wed, 5/27/09, David Zeigler <dzeigler(a)gmail.com&gt; wrote:

...
 From: David Zeigler <dzeigler(a)gmail.com&gt;
 Subject: [rules-users] fact granularity, performance, and other questions
 To: "Rules Users List" <rules-users(a)lists.jboss.org&gt;
 Date: Wednesday, May 27, 2009, 7:17 PM
 Hi,
 I could use some experienced guidance.  I'm in the
 process of
 evaluating Drools for use of using in a real-time
 transactional
 environment to process about 3000 messages/second.  I
 realize a lot of
 this depends on the type and quantity of rules, hardware,
 etc.  I'm
 curious what steps others have taken to improve performance
 and if
 there are any recommendations for my case detailed below.

 A few specific questions I have are:
  - Should each field in a message be a fact? (more info on
 my message
 below)  What fact granularity have you settled on in
 your usage and
 why?
  - Does the order of the conditions in a rule affect
 performance, the
 execution order, or the structure of the Rete network?
  - Does the order the facts are inserted into a stateless
 session (as
 a list via the CommandFactory.newInsertElements) affect
 performance at
 all?

 The message is an EDI format and will typically have
 anywhere from 80
 to 200 fields, potentially more.  The message is
 divided into
 transactions, then segments, then fields.  We have an
 object model
 that represents the message.  We're using a stateless
 session and most
 of the rules will modify fields or add fields and segments
 based on
 the values of other fields.  Currently, I flatten the
 object model
 into a List containing the message, transactions, segments,
 and each
 field, and then insert the list into the stateless session
 and fire.
 I'm avoiding sequential mode for now until we have a better
 idea of
 our requirements.

 Here's a simplified example of what I'm doing now (using
 json-esque
 syntax instead of the EDI format).
 message {
   segment {
     SG:header
     A0:agents
   }
   segment {
     SG:agent
     A1:000
     A2:JAMES
     A3:BOND
   }
   segment {
     SG:agent
     A1:86
     A2:MAXWELL
     A3:SMART
   }
 }

 Each field has an id and a value.  For this message, I
 would insert 14
 facts:  the message object, 3 segments objects, and 10
 field objects.

 rule "set James Bond's A1 to 007"
 when
   $a1 : Field(id == "A1",  value != "007")
   Field(id == "A2",  value == "JAMES")
   Field(id == "A3",  value == "BOND")
 then
   $a1.setValue("007");
   update($a1);
 end

 A relatively more complicated, but typical rule would be
 "set James
 Bond's A1 to 007 iff he's the second agent in the message
 and one of
 the other agents' first names does not contain AUSTIN and
 an agency
 segment exists"

 The above example assumes each segment and field is a fact
 and I think
 it's a clean and flexible approach, but I'm concerned about
 the
 overhead of inserting potentially 250 facts for each
 message.  The
 only other alternatives I can think of seem to have their
 own set of
 problems:
 1. limit the fields that rules can be written against to a
 limited
 subset, which may not be feasible depending on how the
 requirements
 evolve, and only assert those as facts.  Doing this
 seems to double
 the number of transactions per second in my nonscientific
 benchmark.
 2. insert the Message object as a single fact and then
 write a slew of
 accessor methods in that object to get at all possible
 fields in the
 tree:
 getA1FromSecondAgentSegmentInFirstTransaction().  This
 seems
 like it might perform well, but could be very messy.
 3. provide an api in the message object model to find
 various
 occurrences of fields in the messages, then use eval() in
 the rule.
 like eval(msg.findSegment("agent",
 secondOccurrence).getField("A1")).
 I've read that would be less efficient once the ruleset
 grows.

 I'm sure many of you have dealt with this type scenario
 before, what
 did you determine the best approach to be?

 Thanks,
 David
 _______________________________________________
 rules-users mailing list
 rules-users(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/rules-users

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006