[rules-users] fact granularity, performance, and other questions

Wed May 27 20:17:24 EDT 2009

Hi,
I could use some experienced guidance.  I'm in the process of
evaluating Drools for use of using in a real-time transactional
environment to process about 3000 messages/second.  I realize a lot of
this depends on the type and quantity of rules, hardware, etc.  I'm
curious what steps others have taken to improve performance and if
there are any recommendations for my case detailed below.

A few specific questions I have are:
 - Should each field in a message be a fact? (more info on my message
below)  What fact granularity have you settled on in your usage and
why?
 - Does the order of the conditions in a rule affect performance, the
execution order, or the structure of the Rete network?
 - Does the order the facts are inserted into a stateless session (as
a list via the CommandFactory.newInsertElements) affect performance at
all?

The message is an EDI format and will typically have anywhere from 80
to 200 fields, potentially more.  The message is divided into
transactions, then segments, then fields.  We have an object model
that represents the message.  We're using a stateless session and most
of the rules will modify fields or add fields and segments based on
the values of other fields.  Currently, I flatten the object model
into a List containing the message, transactions, segments, and each
field, and then insert the list into the stateless session and fire.
I'm avoiding sequential mode for now until we have a better idea of
our requirements.

Here's a simplified example of what I'm doing now (using json-esque
syntax instead of the EDI format).
message {
  segment {
    SG:header
    A0:agents
  }
  segment {
    SG:agent
    A1:000
    A2:JAMES
    A3:BOND
  }
  segment {
    SG:agent
    A1:86
    A2:MAXWELL
    A3:SMART
  }
}

Each field has an id and a value.  For this message, I would insert 14
facts:  the message object, 3 segments objects, and 10 field objects.

rule "set James Bond's A1 to 007"
when
  $a1 : Field(id == "A1",  value != "007")
  Field(id == "A2",  value == "JAMES")
  Field(id == "A3",  value == "BOND")
then
  $a1.setValue("007");
  update($a1);
end

A relatively more complicated, but typical rule would be "set James
Bond's A1 to 007 iff he's the second agent in the message and one of
the other agents' first names does not contain AUSTIN and an agency
segment exists"

The above example assumes each segment and field is a fact and I think
it's a clean and flexible approach, but I'm concerned about the
overhead of inserting potentially 250 facts for each message.  The
only other alternatives I can think of seem to have their own set of
problems:
1. limit the fields that rules can be written against to a limited
subset, which may not be feasible depending on how the requirements
evolve, and only assert those as facts.  Doing this seems to double
the number of transactions per second in my nonscientific benchmark.
2. insert the Message object as a single fact and then write a slew of
accessor methods in that object to get at all possible fields in the
tree: getA1FromSecondAgentSegmentInFirstTransaction().  This seems
like it might perform well, but could be very messy.
3. provide an api in the message object model to find various
occurrences of fields in the messages, then use eval() in the rule.
like eval(msg.findSegment("agent", secondOccurrence).getField("A1")).
I've read that would be less efficient once the ruleset grows.

I'm sure many of you have dealt with this type scenario before, what
did you determine the best approach to be?

Thanks,
David