To address some points in order:
Yes, having each field in a message as a fact is the most straightforward. The catch is
that the rejoining conditions should be as fast as possible. I'm thinking this
structure would be best:
class Message {
...message level properties...
}
class Segment {
public Message message;
public int position;
...Segment level properties
}
Then the rules would be like this:
when
m: Message(...foo...);
s0: Segment(position == 0, message == m, ...bar...)
s1: Segment(position == 1, message == m, ...bas...)
then
...
end
That's about as fast as you'll get with this setup, I figure. And you're
right, it is important to list the conditions in a certain order: with the "position
== n" condition before the "message == m" condition, MANY Segments in
working memory are eliminated from consideration before getting to the more expensive
cross object join. (Even though the cross object condition is about as cheap as you can
get, a straight == check.)
I'm not sure about the answer to the fact insertion order question vis a vis
performance in a stateless session. My guess is no effect, but probably a question for
the devs. (Or a bit 'o' experimentation.)
It would be nice if your item #2 could be accomplished with a class property that's a
multidimensional String array, though it throws an array out of bounds exception if you
try testing for an array position that's too high. Another way (which I think is what
you suggest below) would be to have a one time generated class to with the following
structure:
class Fact {
String[][][] facts;
public String geta0s0t0() {
if(facts.length >=1 && facts[0].length >= 1 facts[0][0].length >= 0)
{
return facts[0][0][0];
} else {
return "";
}
}
}
Initialize each one with a String[][][] and you're on your way. This would absolutely
be faster than the first method above, probably by several orders of magnitude.
Another approach would be to see how hard it would be to alter mvel so it evaluates the
array[out_of_bounds_index] expressions to false instead of throwing an exception. Maybe
for the long term...
--- On Wed, 5/27/09, David Zeigler <dzeigler(a)gmail.com> wrote:
From: David Zeigler <dzeigler(a)gmail.com>
Subject: [rules-users] fact granularity, performance, and other questions
To: "Rules Users List" <rules-users(a)lists.jboss.org>
Date: Wednesday, May 27, 2009, 7:17 PM
Hi,
I could use some experienced guidance. I'm in the
process of
evaluating Drools for use of using in a real-time
transactional
environment to process about 3000 messages/second. I
realize a lot of
this depends on the type and quantity of rules, hardware,
etc. I'm
curious what steps others have taken to improve performance
and if
there are any recommendations for my case detailed below.
A few specific questions I have are:
- Should each field in a message be a fact? (more info on
my message
below) What fact granularity have you settled on in
your usage and
why?
- Does the order of the conditions in a rule affect
performance, the
execution order, or the structure of the Rete network?
- Does the order the facts are inserted into a stateless
session (as
a list via the CommandFactory.newInsertElements) affect
performance at
all?
The message is an EDI format and will typically have
anywhere from 80
to 200 fields, potentially more. The message is
divided into
transactions, then segments, then fields. We have an
object model
that represents the message. We're using a stateless
session and most
of the rules will modify fields or add fields and segments
based on
the values of other fields. Currently, I flatten the
object model
into a List containing the message, transactions, segments,
and each
field, and then insert the list into the stateless session
and fire.
I'm avoiding sequential mode for now until we have a better
idea of
our requirements.
Here's a simplified example of what I'm doing now (using
json-esque
syntax instead of the EDI format).
message {
segment {
SG:header
A0:agents
}
segment {
SG:agent
A1:000
A2:JAMES
A3:BOND
}
segment {
SG:agent
A1:86
A2:MAXWELL
A3:SMART
}
}
Each field has an id and a value. For this message, I
would insert 14
facts: the message object, 3 segments objects, and 10
field objects.
rule "set James Bond's A1 to 007"
when
$a1 : Field(id == "A1", value != "007")
Field(id == "A2", value == "JAMES")
Field(id == "A3", value == "BOND")
then
$a1.setValue("007");
update($a1);
end
A relatively more complicated, but typical rule would be
"set James
Bond's A1 to 007 iff he's the second agent in the message
and one of
the other agents' first names does not contain AUSTIN and
an agency
segment exists"
The above example assumes each segment and field is a fact
and I think
it's a clean and flexible approach, but I'm concerned about
the
overhead of inserting potentially 250 facts for each
message. The
only other alternatives I can think of seem to have their
own set of
problems:
1. limit the fields that rules can be written against to a
limited
subset, which may not be feasible depending on how the
requirements
evolve, and only assert those as facts. Doing this
seems to double
the number of transactions per second in my nonscientific
benchmark.
2. insert the Message object as a single fact and then
write a slew of
accessor methods in that object to get at all possible
fields in the
tree:
getA1FromSecondAgentSegmentInFirstTransaction(). This
seems
like it might perform well, but could be very messy.
3. provide an api in the message object model to find
various
occurrences of fields in the messages, then use eval() in
the rule.
like eval(msg.findSegment("agent",
secondOccurrence).getField("A1")).
I've read that would be less efficient once the ruleset
grows.
I'm sure many of you have dealt with this type scenario
before, what
did you determine the best approach to be?
Thanks,
David
_______________________________________________
rules-users mailing list
rules-users(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users