[rules-users] Improving Drools Memory Performance

Mark Proctor mproctor at codehaus.org
Wed Jul 21 23:49:26 EDT 2010


  On 22/07/2010 04:05, Jevon Wright wrote:
> HI Mark,
>
>         IEventTrigger ( text.onInput == onInput ) - is the text here
>     the bound variable text, or a field on EventTrigger? the logic
>     isn't very clear. What you have written in java would look like
>         eventTrigger.getText().getOnInput().equals( eventTrigger)
>
>         Is that what you where expecting? that's why we often use the
>     $ prefix to differentiate fields from variables.
>
>
> EventTrigger doesn't have a field "text", so I think it uses the bound 
> variable instead and actually written as:
>
> text.getOnInput().equals( eventTrigger );
>
> I use variables very sparingly in my rules, but I didn't realise that 
> using '$' as a prefix was optional. So using '$onInput' rather than 
> 'onInput' would make no difference, except for readability?
I'm surprised that parses, it's almost definitely wrong then and should 
throw up an error; please open a jira for that so it can be fixed later. 
You should wrap it with an eval:
onInput  : IEventTrigger ( eval( text.onInput == onInput ) )
Or better still write it like this:
IEventTrigger( this == text.onInput )

Mark
>
> Jevon
>
> 2010/7/22 Mark Proctor <mproctor at codehaus.org 
> <mailto:mproctor at codehaus.org>>
>
>     On 22/07/2010 03:28, Jevon Wright wrote:
>>     Hi Mark and Wolfgang,
>>
>>     Thank you for your replies! Comments below.
>>
>>     A bit of background: I am using Drools to take a given EMF model
>>     instance, and insert new EObjects into the instance, according to
>>     the given rules. I try to perform inference top-down, so there is
>>     more than one iteration of insertion - as objects are inserted,
>>     the rules need to be re-evaluated. If I understand correctly,
>>     this means that I can't use a stateless session or the sequential
>>     option, because the working memory is changing with inserted facts.
>>
>>     The rules don't appear to insert directly, because I insert new
>>     objects into a queue instead (queue.add(object, drools)) - once
>>     rule evaluation is complete, I insert the contents of the queue
>>     into the existing working memory and fire all the rules again. I
>>     try to prevent the rules modifying the working memory directly.
>>     This is also why all the rules are of the format (x, ..., y, not
>>     z => insert z).
>>
>>     This approach has a number of benefits. It finds inconsistencies
>>     in the rules and means rules have no order, because inserted
>>     facts don't effect the working memory immediately. It also allows
>>     me to detect infinite loops, without restricting the number of
>>     times a rule can fire. This was described in our 2010 paper [1].
>>
>>     I don't think my implementation of this approach is causing the
>>     memory problem, but I could be wrong.
>>
>>         detail : DetailWire ( (from == source && to == target) ||
>>         (from == target && to == source) )
>>         The above is turned effectively into an MVEL statement, you
>>         might get better performance with a ConditionalElement 'or'
>>         as lont as the
>>         two are mutually exclusive:
>>
>>          ( DetailWire (from == source, to == target ) or
>>            DetailWire (from == target, to == source) )
>>
>>
>>     I thought this was the case. However in this case, you can't bind
>>     the variable "detail" (the Drools compiler won't accept the
>>     syntax), is this correct? I think one solution is to split the
>>     rule into two separate rules for each "or" part (thus a DSL) - I
>>     don't want to have to expand these rules by hand.
>     ( $d : DetailWire (from == source, to == target ) or
>        $d : DetailWire (from == target, to == source) )
>
>     is valid
>
>>
>>         And then i'm not sure what it is you are doing in the second
>>         two rules, but it looks wrong.
>>         text : InputTextField ( eContainer == form, eval
>>         (functions.getAutocompleteInputName(attribute).equals(name)) )
>>         onInput : EventTrigger ( text.onInput == onInput )
>>         currentInput : Property ( text.currentInput == currentInput )
>>
>>
>>     The point of this rule is to select something like the following
>>     (from an EMF instance):
>>
>>     <child name="form">
>>     <child xsi:type="InputTextField" name="...">
>>     <onInput xsi:type="EventTrigger" ... />
>>     <currentInput xsi:type="Property" ... />
>>     <events xsi:type="EventTrigger" ... />
>>     <properties xsi:type="Property" ... />
>>     </child>
>>     </child>
>     IEventTrigger ( text.onInput == onInput ) - is the text here the
>     bound variable text, or a field on EventTrigger? the logic isn't
>     very clear. What you have written in java would look like
>     eventTrigger.getText().getOnInput().equals( eventTrigger)
>
>     Is that what you where expecting? that's why we often use the $
>     prefix to differentiate fields from variables.
>
>>
>>     I can't use use 'eContainer', because 'text' can also contain
>>     EventTriggers in 'text.events'. These bound variables are then
>>     supposed to be used later within the rule, either to select other
>>     variables, or as part of the created element.
>>
>>     I am going to try and remove unused bound variables, though. I
>>     think I will try and write a script to analyse the exported XML
>>     for the rules to analyse automatically (I have 264 rules written
>>     by hand).
>>
>>     Thanks
>>     Jevon
>>
>>     [1]: J. Wright and J. Dietrich, "Non-Montonic Model Completion in
>>     Web Application Engineering," in Proceedings of the 21st
>>     Australian Software Engineering Conference (ASWEC 2010)
>>     <http://aswec2010.massey.ac.nz/>, Auckland, New Zealand, 2010.
>>     http://openiaml.org/#completion
>>
>>     2010/7/16 Mark Proctor <mproctor at codehaus.org
>>     <mailto:mproctor at codehaus.org>>
>>
>>         detail : DetailWire ( (from == source&&  to == target) || (from == target&&  to == source) )
>>         The above is turned effectively into an MVEL statement, you might get better performance with a ConditionalElement 'or' as lont as the
>>         two are mutually exclusive:
>>
>>           ( DetailWire (from == source, to == target ) or
>>             DetailWire (from == target, to == source) )
>>
>>         I saw you did this:
>>         not ( form : InputForm ( eContainer == container, name ==iterator.name  <http://iterator.name>  ) )
>>
>>         The 'form' is not accessible outside the 'not', and that rule does not need it.
>>
>>         Is this not a bug. You bind "text". And then i'm not sure what it is you are doing in the second two rules, but it looks wrong.
>>         text : InputTextField ( eContainer == form, eval (functions.getAutocompleteInputName(attribute).equals(name)) )
>>         onInput : EventTrigger ( text.onInput == onInput
>>         currentInput : Property ( text.currentInput == currentInput )
>>
>>         It doesn't look like you are updating the session with facts, i.e. it's a stateless session. See if this helps
>>
>>         KnowledgeBaseConfiguration kconf = KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
>>         kconf.setOption( SequentialOption.YES );
>>
>>         KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase( kconf );
>>         final StatelessKnowledgeSession ksession = kbase.newStatelessKnowledgeSession();
>>         ksession.execute(....);
>>
>>         In the execute you can provie it with a batch of commands to execute, or just a list of objects, up to you. see stateless session for
>>         more details.
>>
>>         The SequentialOption may help memory, a small mount, if you aren't doing any working memory modifications (insert/modify/update/retract).
>>
>>         Mark
>>
>>
>>         On 16/07/2010 04:16, Jevon Wright wrote:
>>>         Hi again,
>>>
>>>         By removing all of the simple eval()s from my rules, I have
>>>         cut heap usage by at least an order of magnitude. However
>>>         this still isn't enough.
>>>
>>>         Since I am trying to reduce the cross-product size (as in
>>>         SQL), I recall that most SQL implementations have a
>>>         "DESCRIBE SELECT" query which provides real-time information
>>>         about the complexity of a given SQL query - i.e. the size of
>>>         the tables, indexes used, and so on. Is there any such tool
>>>         available for Drools? Are there any tools which can provide
>>>         clues as to which rules are using the most memory?
>>>
>>>         Alternatively, I am wondering what kind of benefit I could
>>>         expect from using materialized views to create summary
>>>         tables; that is, deriving and inserting additional facts.
>>>         This would allow Drools to rewrite queries that currently
>>>         use eval(), but would increase the size of working memory,
>>>         so would this actually save heap size?
>>>
>>>         To what extent does Drools rewrite queries? Is there any
>>>         documentation describing the approaches used?
>>>
>>>         Any other ideas on how to reduce heap memory usage? I'd
>>>         appreciate any ideas :)
>>>
>>>         Thanks
>>>         Jevon
>>>
>>>
>>>         On Mon, Jul 12, 2010 at 5:56 PM, Jevon Wright
>>>         <jevon at jevon.org <mailto:jevon at jevon.org>> wrote:
>>>
>>>             Hi Wolfgang and Mark,
>>>
>>>             Thank you for your replies! You were correct: my eval()
>>>             functions
>>>             could generally be rewritten into Drools directly.
>>>
>>>             I had one function "connectsDetail" that was constraining
>>>             unidirectional edges, and could be rewritten from:
>>>              detail : DetailWire ( )
>>>              eval ( functions.connectsDetail(detail, source, target) )
>>>
>>>             to:
>>>              detail : DetailWire ( from == source, to == target )
>>>
>>>             Another function, "connects", was constraining
>>>             bidirectional edges,
>>>             and could be rewritten from:
>>>              sync : SyncWire( )
>>>              eval ( functions.connects(sync, source, target) )
>>>
>>>             to:
>>>              sync : SyncWire( (from == source && to == target) ||
>>>             (from == target
>>>             && to == source) )
>>>
>>>             Finally, the "veto" function could be rewritten from:
>>>              detail : DetailWire ( )
>>>              eval ( handler.veto(detail) )
>>>
>>>             to:
>>>              detail : DetailWire ( overridden == false )
>>>
>>>             I took each of these three changes, and evaluated them
>>>             separately [1].
>>>             I found that:
>>>
>>>             1. Inlining 'connectsDetail' made a huge difference -
>>>             10-30% faster
>>>             execution and 50-60% less allocated heap.
>>>             2. Inlining 'connects' made very little difference -
>>>             10-30% faster
>>>             execution, but 0-20% more allocated heap.
>>>             3. Inlining 'veto' made no difference - no significant
>>>             change in
>>>             execution speed or allocated heap.
>>>
>>>             I think I understand why inlining 'connects' would
>>>             improve heap usage
>>>             - because the rules essentially have more conditionals?
>>>
>>>             I also understand why 'veto' made no difference - for
>>>             most of my test
>>>             models, "overridden" was never true, so adding this
>>>             conditional was
>>>             not making the cross product set any smaller.
>>>
>>>             Finally, I also tested simply joining all of the rules
>>>             together into
>>>             one file. This happily made no difference at all
>>>             (although made it
>>>             more difficult to edit).
>>>
>>>             So I think I can safely conclude that eval() should be
>>>             used as little
>>>             as possible - however, this means that the final rules
>>>             are made more
>>>             complicated and less human-readable, so a DSL may be
>>>             best for my
>>>             common rule patterns in the future.
>>>
>>>             Thanks again!
>>>             Jevon
>>>
>>>             [1]:
>>>             http://www.jevon.org/wiki/Improving_Drools_Memory_Performance
>>>
>>>             On Sat, Jul 10, 2010 at 12:28 AM, Wolfgang Laun
>>>             <wolfgang.laun at gmail.com
>>>             <mailto:wolfgang.laun at gmail.com>> wrote:
>>>             > On 9 July 2010 14:14, Mark Proctor
>>>             <mproctor at codehaus.org <mailto:mproctor at codehaus.org>>
>>>             wrote:
>>>             >>  You have many objects there that are not constrained;
>>>             >
>>>             > I have an inkling that the functions.*() are hiding
>>>             just these contraints,
>>>             > It's certainly the wrong way, starting with oodles of
>>>             node pairs, just to
>>>             > pick out connected ones by fishing for the connecting
>>>             edge. And this
>>>             > is worsened by trying to find two such pairs which
>>>             meet at some
>>>             > DomainSource
>>>             >
>>>             > Guesswork, hopefully educated ;-)
>>>             >
>>>             > -W
>>>             >
>>>             >
>>>             >> if there are
>>>             >> multiple versions of those objects you are going to
>>>             get massive amounts
>>>             >> of cross products. Think in terms of SQL, each
>>>             pattern you add is like
>>>             >> an SQL join.
>>>             >>
>>>             >> Mark
>>>             >> On 09/07/2010 09:20, Jevon Wright wrote:
>>>             >>> Hi everyone,
>>>             >>>
>>>             >>> I am working on what appears to be a fairly complex
>>>             rule base based on
>>>             >>> EMF. The rules aren't operating over a huge number
>>>             of facts (less than
>>>             >>> 10,000 EObjects) and there aren't too many rules
>>>             (less than 300), but
>>>             >>> I am having a problem with running out of Java heap
>>>             space (set at ~400
>>>             >>> MB).
>>>             >>>
>>>             >>> Through investigation, I came to the conclusion that
>>>             this is due to
>>>             >>> the design of the rules, rather than the number of
>>>             facts. The engine
>>>             >>> uses less memory inserting many facts that use
>>>             simple rules, compared
>>>             >>> with inserting few facts that use many rules.
>>>             >>>
>>>             >>> Can anybody suggest some tips for reducing heap
>>>             memory usage in
>>>             >>> Drools? I don't have a time constraint, only a
>>>             heap/memory constraint.
>>>             >>> A sample rule in my project looks like this:
>>>             >>>
>>>             >>>    rule "Create QueryParameter for target container
>>>             of DetailWire"
>>>             >>>      when
>>>             >>>        container : Frame( )
>>>             >>>        schema : DomainSchema ( )
>>>             >>>        domainSource : DomainSource ( )
>>>             >>>        instance : DomainIterator( )
>>>             >>>        selectEdge : SelectEdge ( eval (
>>>             >>> functions.connectsSelect(selectEdge, instance,
>>>             domainSource )) )
>>>             >>>        schemaEdge : SchemaEdge ( eval (
>>>             >>> functions.connectsSchema(schemaEdge, domainSource,
>>>             schema )) )
>>>             >>>        source : VisibleThing ( eContainer == container )
>>>             >>>        target : Frame ( )
>>>             >>>        instanceSet : SetWire (
>>>             eval(functions.connectsSet(instanceSet,
>>>             >>> instance, source )) )
>>>             >>>        detail : DetailWire ( )
>>>             >>>        eval ( functions.connectsDetail(detail,
>>>             source, target ))
>>>             >>>        pk : DomainAttribute ( eContainer == schema,
>>>             primaryKey == true )
>>>             >>>        not ( queryPk : QueryParameter ( eContainer
>>>             == target, name == pk.name <http://pk.name> ) )
>>>             >>>        eval ( handler.veto( detail ))
>>>             >>>
>>>             >>>      then
>>>             >>>        QueryParameter qp =
>>>             handler.generatedQueryParameter(detail, target);
>>>             >>>        handler.setName(qp, pk.getName());
>>>             >>>        queue.add(qp, drools); // wraps insert(...)
>>>             >>>
>>>             >>>    end
>>>             >>>
>>>             >>> I try to order the select statements in an order
>>>             that will reduce the
>>>             >>> size of the cross-product (in theory), but I also
>>>             try and keep the
>>>             >>> rules fairly human readable. I try to avoid
>>>             comparison operators like
>>>             >>> <  and>. Analysing a heap dump shows that most of
>>>             the memory is being
>>>             >>> used in StatefulSession.nodeMemories>  PrimitiveLongMap.
>>>             >>>
>>>             >>> I am using a StatefulSession; if I understand
>>>             correctly, I can't use a
>>>             >>> StatelessSession with sequential mode since I am
>>>             inserting facts as
>>>             >>> part of the rules. If I also understand correctly,
>>>             I'd like the Rete
>>>             >>> graph to be tall, rather than wide.
>>>             >>>
>>>             >>> Some ideas I have thought of include the following:
>>>             >>> 1. Creating a separate intermediary meta-model to
>>>             split up the sizes
>>>             >>> of the rules. e.g. instead of (if A and B and C then
>>>             insert D), using
>>>             >>> (if A and B then insert E; if E and C then insert D).
>>>             >>> 2. Moving eval() statements directly into the
>>>             Type(...) selectors.
>>>             >>> 3. Removing eval() statements. Would this allow for
>>>             better indexing by
>>>             >>> the Rete algorithm?
>>>             >>> 4. Reducing the height, or the width, of the class
>>>             hierarchy of the
>>>             >>> facts. e.g. Removing interfaces or abstract classes
>>>             to reduce the
>>>             >>> possible matches. Would this make a difference?
>>>             >>> 5. Conversely, increasing the height, or the width,
>>>             of the class
>>>             >>> hierarchy. e.g. Adding interfaces or abstract
>>>             classes to reduce field
>>>             >>> accessors.
>>>             >>> 6. Instead of using EObject.eContainer, creating an
>>>             explicit
>>>             >>> containment property in all of my EObjects.
>>>             >>> 7. Creating a DSL that is human-readable, but allows
>>>             for the
>>>             >>> automation of some of these approaches.
>>>             >>> 8. Moving all rules into one rule file, or splitting
>>>             up rules into
>>>             >>> smaller files.
>>>             >>>
>>>             >>> Is there kind of profiler for Drools that will let
>>>             me see the size (or
>>>             >>> the memory usage) of particular rules, or of the
>>>             memory used after
>>>             >>> inference? Ideally I'd use this to profile any changes.
>>>             >>>
>>>             >>> Thanks for any thoughts or tips! :-)
>>>             >>>
>>>             >>> Jevon
>>>             >>> _______________________________________________
>>>             >>> rules-users mailing list
>>>             >>> rules-users at lists.jboss.org
>>>             <mailto:rules-users at lists.jboss.org>
>>>             >>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>             >>>
>>>             >>>
>>>             >>
>>>             >>
>>>             >> _______________________________________________
>>>             >> rules-users mailing list
>>>             >> rules-users at lists.jboss.org
>>>             <mailto:rules-users at lists.jboss.org>
>>>             >> https://lists.jboss.org/mailman/listinfo/rules-users
>>>             >>
>>>             >
>>>             > _______________________________________________
>>>             > rules-users mailing list
>>>             > rules-users at lists.jboss.org
>>>             <mailto:rules-users at lists.jboss.org>
>>>             > https://lists.jboss.org/mailman/listinfo/rules-users
>>>             >
>>>
>>>
>>>
>>>         _______________________________________________
>>>         rules-users mailing list
>>>         rules-users at lists.jboss.org  <mailto:rules-users at lists.jboss.org>
>>>         https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>         _______________________________________________
>>         rules-users mailing list
>>         rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>>         https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>>     _______________________________________________
>>     rules-users mailing list
>>     rules-users at lists.jboss.org  <mailto:rules-users at lists.jboss.org>
>>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>     _______________________________________________
>     rules-users mailing list
>     rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20100722/d3f1bef0/attachment.html 


More information about the rules-users mailing list