[jboss-jira] [JBoss JIRA] Reopened: (JBRULES-255) Dynamic index priorization

Tue May 8 09:27:52 EDT 2007

     [ http://jira.jboss.com/jira/browse/JBRULES-255?page=all ]

Edson Tirelli reopened JBRULES-255:
-----------------------------------

             

> Dynamic index priorization
> --------------------------
>
>                 Key: JBRULES-255
>                 URL: http://jira.jboss.com/jira/browse/JBRULES-255
>             Project: JBoss Rules
>          Issue Type: Feature Request
>      Security Level: Public(Everyone can see) 
>          Components: Reteoo
>            Reporter: Edson Tirelli
>         Assigned To: Edson Tirelli
>            Priority: Minor
>
>   All,
>   Follows some thoughts about the current and future solutions for beta node memory indexing. Suggestions and opinions are welcome. Sorry for the long e-mail.
>   I think a good way to discuss that is making an analogy to databases systems. As we all know, indexing is a great mechanism for performance improvements on database queries, but also represent a general overhead on other operations like insert, updates and deletes. Also, there is a memory consumption cost involved. A well planned set of indexes is essential for most enterprise applications and the responsibles for defining them are usually the DBAs. Once indexes are defined, when a query is executed against that database, a query planner component is used by database systems to estimate the best plan to run the query with the best performance, sometimes using the index, sometimes not.
>   Well, working memory has the same issues and same thoughts are valid here. Drools 3 implemented an automatic indexing strategy to index beta node memories in order to boost performance. Just to have some data to understand the consequences of it, lets use Manners 64 benchmark test results on a Pentium IV 3Ghz HT machine with 1,0 Gb memory. This is not really a detailed benchmark test, but simply some rough numbers in order to every one understand what happened, what is happening, and possibly suggest some paths for the future. (times in milliseconds)
> Manners 64 without indexes: 135000 millisec to run
> Manners 64 with beta node indexes: 10500 millisec to run
>    So, I think it is obviouse that indexes overall benefits pays off the overhead to keep them, at least in terms of performance. I'm not discussing limited memory enviroments here.
>    During the last few days I started to play with indexing again and find some odd results. To understand them, let me first explain that a beta node memory might be indexed by one or more indexes. It all depends on how many BoundVariableConstraints that beta node will have. For instance:
> MyClass1( $myVal1 : attr1, $myVal2 : attr2 )
> MyClass2( attr1 == $myVal1, attr2 != $myVal2, attr3 == "bla", attr4 == 5 )
>    The above rule would have a join node with two indexed constraints: attr1 == $myVal1, attr2 != $myVal2.
>    What happens is that the engine is capable of indexing none of them, one of them, or both of them. The way it is today, it will always index by both of them.
>    Just to investigate what are the indexing effects, I tried to make a code change to turn off multi-index for right beta memories. So, if a beta node has more than one constraint, I would elect one (arbitrarily) and index only that column. The result:
> Manners 64 with indexing without multi-index on RIGHT memories: 12000 millisec to run
>     So, multi-index on right memories are good for manners. Then I turned on multi-index for right memories and turned off multi-index for left memories:
> Manners 64 with indexing without multi-index on LEFT memories: 9500 millisec to run
>    So, basically, keeping multi-indexes for left memory on manners does not pay off the overhead.
>    As you can realize by now, this is TOTALLY dependent on rules AND data asserted into memory. For a different ruleset, the results could be totally different.
>    Besides that, somedays ago, a drools user reported an odd problem, that some may remember. He wrote a rule with the following column:
> 1. alarmdefine(alarmreason==reason, alarmlevel!=level)
>    The rule was running very slowly for an average dataset (about 20 seconds). I sugested inverting the constraints to:
> 2. alarmdefine(alarmlevel!=level, alarmreason==reason)
>    And the rule ran very fast (about 0.6 seconds). And the reason for this is because Drools has a fixed index ordering from last to first. So, for the first case above, where two indexes will be used (alarmreason and alarmlevel), drools will index the join node memory by [alarmlevel + alarmreason]. Nothing wrong so far, except that the data asserted into memory had all the same alarmlevel, so the index was adding overhead, but was completelly useless, as it was indexing the same value in all objects. (for more details on this subject, please look into list archive).
>    Thinking about that, I devised a strategy to do dynamic indexing priorization. The idea is that at run time, when a new object or tuple reaches a beta node and looks for matches, the engine will dynamically elect and order indexes to that specific match. So in the above case, it would always elect alarmreason as the primary index, but if someone asserts a fact where the alarmlevel becomes the most restrictive index, for that instance only, the primary index would be alarmlevel.
>    The result is that the rule runs in about 0.8 seconds for all cases, does not matter the way it is written.
> Manners 64 with indexing priorization: 12000 millisec
>    As you can see, this dynamic indexing priorization has an overhead cost that allows one to not worry about how it is writing the rule, as long as he can accept a slowdown on performance.
>    Thinking in all that, I see only 2 ways for drools to follow:
> 1. The first way is to allow users to fine tune their rules for best performance. This can be achieved making indexes as fast as possible and allowing the user to enable/disable/prioritize individual indexes on their rules. This is easier to implement, as only some syntax sugar needs to be supported by the parser, and the index factory needs access to the user definitions.
> 2. The second way would be to develop a so smart optimizer that it can decide when to use and when not to use indexes as well as which index to use in a similar way some databases do.The index priorization strategy I developed might be a rough start, but it is miles away from being the solution.
>    Well, this is how things are on this area of development. My suggestion is, based on the results, not commit the index priorization stuff. I will just make a backup of it for future reference, but right now I think the overhead does not pay off. Better to educate users on how to write better rules for now.
>    If anyone has or knows about good material on this area of knowledge, I would really appreciate links or suggestions.
>    Let me know what you think.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira