]
Edson Tirelli closed JBRULES-255.
---------------------------------
Resolution: Rejected
New codebase makes this issue obsolete.
Dynamic index priorization
--------------------------
Key: JBRULES-255
URL:
http://jira.jboss.com/jira/browse/JBRULES-255
Project: JBoss Rules
Issue Type: Feature Request
Security Level: Public(Everyone can see)
Components: Reteoo
Reporter: Edson Tirelli
Assigned To: Edson Tirelli
Priority: Minor
Fix For: 3.1-m3
All,
Follows some thoughts about the current and future solutions for beta node memory
indexing. Suggestions and opinions are welcome. Sorry for the long e-mail.
I think a good way to discuss that is making an analogy to databases systems. As we all
know, indexing is a great mechanism for performance improvements on database queries, but
also represent a general overhead on other operations like insert, updates and deletes.
Also, there is a memory consumption cost involved. A well planned set of indexes is
essential for most enterprise applications and the responsibles for defining them are
usually the DBAs. Once indexes are defined, when a query is executed against that
database, a query planner component is used by database systems to estimate the best plan
to run the query with the best performance, sometimes using the index, sometimes not.
Well, working memory has the same issues and same thoughts are valid here. Drools 3
implemented an automatic indexing strategy to index beta node memories in order to boost
performance. Just to have some data to understand the consequences of it, lets use Manners
64 benchmark test results on a Pentium IV 3Ghz HT machine with 1,0 Gb memory. This is not
really a detailed benchmark test, but simply some rough numbers in order to every one
understand what happened, what is happening, and possibly suggest some paths for the
future. (times in milliseconds)
Manners 64 without indexes: 135000 millisec to run
Manners 64 with beta node indexes: 10500 millisec to run
So, I think it is obviouse that indexes overall benefits pays off the overhead to keep
them, at least in terms of performance. I'm not discussing limited memory enviroments
here.
During the last few days I started to play with indexing again and find some odd
results. To understand them, let me first explain that a beta node memory might be indexed
by one or more indexes. It all depends on how many BoundVariableConstraints that beta node
will have. For instance:
MyClass1( $myVal1 : attr1, $myVal2 : attr2 )
MyClass2( attr1 == $myVal1, attr2 != $myVal2, attr3 == "bla", attr4 == 5 )
The above rule would have a join node with two indexed constraints: attr1 == $myVal1,
attr2 != $myVal2.
What happens is that the engine is capable of indexing none of them, one of them, or
both of them. The way it is today, it will always index by both of them.
Just to investigate what are the indexing effects, I tried to make a code change to
turn off multi-index for right beta memories. So, if a beta node has more than one
constraint, I would elect one (arbitrarily) and index only that column. The result:
Manners 64 with indexing without multi-index on RIGHT memories: 12000 millisec to run
So, multi-index on right memories are good for manners. Then I turned on multi-index
for right memories and turned off multi-index for left memories:
Manners 64 with indexing without multi-index on LEFT memories: 9500 millisec to run
So, basically, keeping multi-indexes for left memory on manners does not pay off the
overhead.
As you can realize by now, this is TOTALLY dependent on rules AND data asserted into
memory. For a different ruleset, the results could be totally different.
Besides that, somedays ago, a drools user reported an odd problem, that some may
remember. He wrote a rule with the following column:
1. alarmdefine(alarmreason==reason, alarmlevel!=level)
The rule was running very slowly for an average dataset (about 20 seconds). I sugested
inverting the constraints to:
2. alarmdefine(alarmlevel!=level, alarmreason==reason)
And the rule ran very fast (about 0.6 seconds). And the reason for this is because
Drools has a fixed index ordering from last to first. So, for the first case above, where
two indexes will be used (alarmreason and alarmlevel), drools will index the join node
memory by [alarmlevel + alarmreason]. Nothing wrong so far, except that the data asserted
into memory had all the same alarmlevel, so the index was adding overhead, but was
completelly useless, as it was indexing the same value in all objects. (for more details
on this subject, please look into list archive).
Thinking about that, I devised a strategy to do dynamic indexing priorization. The
idea is that at run time, when a new object or tuple reaches a beta node and looks for
matches, the engine will dynamically elect and order indexes to that specific match. So in
the above case, it would always elect alarmreason as the primary index, but if someone
asserts a fact where the alarmlevel becomes the most restrictive index, for that instance
only, the primary index would be alarmlevel.
The result is that the rule runs in about 0.8 seconds for all cases, does not matter
the way it is written.
Manners 64 with indexing priorization: 12000 millisec
As you can see, this dynamic indexing priorization has an overhead cost that allows
one to not worry about how it is writing the rule, as long as he can accept a slowdown on
performance.
Thinking in all that, I see only 2 ways for drools to follow:
1. The first way is to allow users to fine tune their rules for best performance. This
can be achieved making indexes as fast as possible and allowing the user to
enable/disable/prioritize individual indexes on their rules. This is easier to implement,
as only some syntax sugar needs to be supported by the parser, and the index factory needs
access to the user definitions.
2. The second way would be to develop a so smart optimizer that it can decide when to use
and when not to use indexes as well as which index to use in a similar way some databases
do.The index priorization strategy I developed might be a rough start, but it is miles
away from being the solution.
Well, this is how things are on this area of development. My suggestion is, based on
the results, not commit the index priorization stuff. I will just make a backup of it for
future reference, but right now I think the overhead does not pay off. Better to educate
users on how to write better rules for now.
If anyone has or knows about good material on this area of knowledge, I would really
appreciate links or suggestions.
Let me know what you think.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: