[rules-users] Improving Drools Memory Performance
Mark Proctor
mproctor at codehaus.org
Wed Jul 21 23:49:26 EDT 2010
On 22/07/2010 04:05, Jevon Wright wrote:
> HI Mark,
>
> IEventTrigger ( text.onInput == onInput ) - is the text here
> the bound variable text, or a field on EventTrigger? the logic
> isn't very clear. What you have written in java would look like
> eventTrigger.getText().getOnInput().equals( eventTrigger)
>
> Is that what you where expecting? that's why we often use the
> $ prefix to differentiate fields from variables.
>
>
> EventTrigger doesn't have a field "text", so I think it uses the bound
> variable instead and actually written as:
>
> text.getOnInput().equals( eventTrigger );
>
> I use variables very sparingly in my rules, but I didn't realise that
> using '$' as a prefix was optional. So using '$onInput' rather than
> 'onInput' would make no difference, except for readability?
I'm surprised that parses, it's almost definitely wrong then and should
throw up an error; please open a jira for that so it can be fixed later.
You should wrap it with an eval:
onInput : IEventTrigger ( eval( text.onInput == onInput ) )
Or better still write it like this:
IEventTrigger( this == text.onInput )
Mark
>
> Jevon
>
> 2010/7/22 Mark Proctor <mproctor at codehaus.org
> <mailto:mproctor at codehaus.org>>
>
> On 22/07/2010 03:28, Jevon Wright wrote:
>> Hi Mark and Wolfgang,
>>
>> Thank you for your replies! Comments below.
>>
>> A bit of background: I am using Drools to take a given EMF model
>> instance, and insert new EObjects into the instance, according to
>> the given rules. I try to perform inference top-down, so there is
>> more than one iteration of insertion - as objects are inserted,
>> the rules need to be re-evaluated. If I understand correctly,
>> this means that I can't use a stateless session or the sequential
>> option, because the working memory is changing with inserted facts.
>>
>> The rules don't appear to insert directly, because I insert new
>> objects into a queue instead (queue.add(object, drools)) - once
>> rule evaluation is complete, I insert the contents of the queue
>> into the existing working memory and fire all the rules again. I
>> try to prevent the rules modifying the working memory directly.
>> This is also why all the rules are of the format (x, ..., y, not
>> z => insert z).
>>
>> This approach has a number of benefits. It finds inconsistencies
>> in the rules and means rules have no order, because inserted
>> facts don't effect the working memory immediately. It also allows
>> me to detect infinite loops, without restricting the number of
>> times a rule can fire. This was described in our 2010 paper [1].
>>
>> I don't think my implementation of this approach is causing the
>> memory problem, but I could be wrong.
>>
>> detail : DetailWire ( (from == source && to == target) ||
>> (from == target && to == source) )
>> The above is turned effectively into an MVEL statement, you
>> might get better performance with a ConditionalElement 'or'
>> as lont as the
>> two are mutually exclusive:
>>
>> ( DetailWire (from == source, to == target ) or
>> DetailWire (from == target, to == source) )
>>
>>
>> I thought this was the case. However in this case, you can't bind
>> the variable "detail" (the Drools compiler won't accept the
>> syntax), is this correct? I think one solution is to split the
>> rule into two separate rules for each "or" part (thus a DSL) - I
>> don't want to have to expand these rules by hand.
> ( $d : DetailWire (from == source, to == target ) or
> $d : DetailWire (from == target, to == source) )
>
> is valid
>
>>
>> And then i'm not sure what it is you are doing in the second
>> two rules, but it looks wrong.
>> text : InputTextField ( eContainer == form, eval
>> (functions.getAutocompleteInputName(attribute).equals(name)) )
>> onInput : EventTrigger ( text.onInput == onInput )
>> currentInput : Property ( text.currentInput == currentInput )
>>
>>
>> The point of this rule is to select something like the following
>> (from an EMF instance):
>>
>> <child name="form">
>> <child xsi:type="InputTextField" name="...">
>> <onInput xsi:type="EventTrigger" ... />
>> <currentInput xsi:type="Property" ... />
>> <events xsi:type="EventTrigger" ... />
>> <properties xsi:type="Property" ... />
>> </child>
>> </child>
> IEventTrigger ( text.onInput == onInput ) - is the text here the
> bound variable text, or a field on EventTrigger? the logic isn't
> very clear. What you have written in java would look like
> eventTrigger.getText().getOnInput().equals( eventTrigger)
>
> Is that what you where expecting? that's why we often use the $
> prefix to differentiate fields from variables.
>
>>
>> I can't use use 'eContainer', because 'text' can also contain
>> EventTriggers in 'text.events'. These bound variables are then
>> supposed to be used later within the rule, either to select other
>> variables, or as part of the created element.
>>
>> I am going to try and remove unused bound variables, though. I
>> think I will try and write a script to analyse the exported XML
>> for the rules to analyse automatically (I have 264 rules written
>> by hand).
>>
>> Thanks
>> Jevon
>>
>> [1]: J. Wright and J. Dietrich, "Non-Montonic Model Completion in
>> Web Application Engineering," in Proceedings of the 21st
>> Australian Software Engineering Conference (ASWEC 2010)
>> <http://aswec2010.massey.ac.nz/>, Auckland, New Zealand, 2010.
>> http://openiaml.org/#completion
>>
>> 2010/7/16 Mark Proctor <mproctor at codehaus.org
>> <mailto:mproctor at codehaus.org>>
>>
>> detail : DetailWire ( (from == source&& to == target) || (from == target&& to == source) )
>> The above is turned effectively into an MVEL statement, you might get better performance with a ConditionalElement 'or' as lont as the
>> two are mutually exclusive:
>>
>> ( DetailWire (from == source, to == target ) or
>> DetailWire (from == target, to == source) )
>>
>> I saw you did this:
>> not ( form : InputForm ( eContainer == container, name ==iterator.name <http://iterator.name> ) )
>>
>> The 'form' is not accessible outside the 'not', and that rule does not need it.
>>
>> Is this not a bug. You bind "text". And then i'm not sure what it is you are doing in the second two rules, but it looks wrong.
>> text : InputTextField ( eContainer == form, eval (functions.getAutocompleteInputName(attribute).equals(name)) )
>> onInput : EventTrigger ( text.onInput == onInput
>> currentInput : Property ( text.currentInput == currentInput )
>>
>> It doesn't look like you are updating the session with facts, i.e. it's a stateless session. See if this helps
>>
>> KnowledgeBaseConfiguration kconf = KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
>> kconf.setOption( SequentialOption.YES );
>>
>> KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase( kconf );
>> final StatelessKnowledgeSession ksession = kbase.newStatelessKnowledgeSession();
>> ksession.execute(....);
>>
>> In the execute you can provie it with a batch of commands to execute, or just a list of objects, up to you. see stateless session for
>> more details.
>>
>> The SequentialOption may help memory, a small mount, if you aren't doing any working memory modifications (insert/modify/update/retract).
>>
>> Mark
>>
>>
>> On 16/07/2010 04:16, Jevon Wright wrote:
>>> Hi again,
>>>
>>> By removing all of the simple eval()s from my rules, I have
>>> cut heap usage by at least an order of magnitude. However
>>> this still isn't enough.
>>>
>>> Since I am trying to reduce the cross-product size (as in
>>> SQL), I recall that most SQL implementations have a
>>> "DESCRIBE SELECT" query which provides real-time information
>>> about the complexity of a given SQL query - i.e. the size of
>>> the tables, indexes used, and so on. Is there any such tool
>>> available for Drools? Are there any tools which can provide
>>> clues as to which rules are using the most memory?
>>>
>>> Alternatively, I am wondering what kind of benefit I could
>>> expect from using materialized views to create summary
>>> tables; that is, deriving and inserting additional facts.
>>> This would allow Drools to rewrite queries that currently
>>> use eval(), but would increase the size of working memory,
>>> so would this actually save heap size?
>>>
>>> To what extent does Drools rewrite queries? Is there any
>>> documentation describing the approaches used?
>>>
>>> Any other ideas on how to reduce heap memory usage? I'd
>>> appreciate any ideas :)
>>>
>>> Thanks
>>> Jevon
>>>
>>>
>>> On Mon, Jul 12, 2010 at 5:56 PM, Jevon Wright
>>> <jevon at jevon.org <mailto:jevon at jevon.org>> wrote:
>>>
>>> Hi Wolfgang and Mark,
>>>
>>> Thank you for your replies! You were correct: my eval()
>>> functions
>>> could generally be rewritten into Drools directly.
>>>
>>> I had one function "connectsDetail" that was constraining
>>> unidirectional edges, and could be rewritten from:
>>> detail : DetailWire ( )
>>> eval ( functions.connectsDetail(detail, source, target) )
>>>
>>> to:
>>> detail : DetailWire ( from == source, to == target )
>>>
>>> Another function, "connects", was constraining
>>> bidirectional edges,
>>> and could be rewritten from:
>>> sync : SyncWire( )
>>> eval ( functions.connects(sync, source, target) )
>>>
>>> to:
>>> sync : SyncWire( (from == source && to == target) ||
>>> (from == target
>>> && to == source) )
>>>
>>> Finally, the "veto" function could be rewritten from:
>>> detail : DetailWire ( )
>>> eval ( handler.veto(detail) )
>>>
>>> to:
>>> detail : DetailWire ( overridden == false )
>>>
>>> I took each of these three changes, and evaluated them
>>> separately [1].
>>> I found that:
>>>
>>> 1. Inlining 'connectsDetail' made a huge difference -
>>> 10-30% faster
>>> execution and 50-60% less allocated heap.
>>> 2. Inlining 'connects' made very little difference -
>>> 10-30% faster
>>> execution, but 0-20% more allocated heap.
>>> 3. Inlining 'veto' made no difference - no significant
>>> change in
>>> execution speed or allocated heap.
>>>
>>> I think I understand why inlining 'connects' would
>>> improve heap usage
>>> - because the rules essentially have more conditionals?
>>>
>>> I also understand why 'veto' made no difference - for
>>> most of my test
>>> models, "overridden" was never true, so adding this
>>> conditional was
>>> not making the cross product set any smaller.
>>>
>>> Finally, I also tested simply joining all of the rules
>>> together into
>>> one file. This happily made no difference at all
>>> (although made it
>>> more difficult to edit).
>>>
>>> So I think I can safely conclude that eval() should be
>>> used as little
>>> as possible - however, this means that the final rules
>>> are made more
>>> complicated and less human-readable, so a DSL may be
>>> best for my
>>> common rule patterns in the future.
>>>
>>> Thanks again!
>>> Jevon
>>>
>>> [1]:
>>> http://www.jevon.org/wiki/Improving_Drools_Memory_Performance
>>>
>>> On Sat, Jul 10, 2010 at 12:28 AM, Wolfgang Laun
>>> <wolfgang.laun at gmail.com
>>> <mailto:wolfgang.laun at gmail.com>> wrote:
>>> > On 9 July 2010 14:14, Mark Proctor
>>> <mproctor at codehaus.org <mailto:mproctor at codehaus.org>>
>>> wrote:
>>> >> You have many objects there that are not constrained;
>>> >
>>> > I have an inkling that the functions.*() are hiding
>>> just these contraints,
>>> > It's certainly the wrong way, starting with oodles of
>>> node pairs, just to
>>> > pick out connected ones by fishing for the connecting
>>> edge. And this
>>> > is worsened by trying to find two such pairs which
>>> meet at some
>>> > DomainSource
>>> >
>>> > Guesswork, hopefully educated ;-)
>>> >
>>> > -W
>>> >
>>> >
>>> >> if there are
>>> >> multiple versions of those objects you are going to
>>> get massive amounts
>>> >> of cross products. Think in terms of SQL, each
>>> pattern you add is like
>>> >> an SQL join.
>>> >>
>>> >> Mark
>>> >> On 09/07/2010 09:20, Jevon Wright wrote:
>>> >>> Hi everyone,
>>> >>>
>>> >>> I am working on what appears to be a fairly complex
>>> rule base based on
>>> >>> EMF. The rules aren't operating over a huge number
>>> of facts (less than
>>> >>> 10,000 EObjects) and there aren't too many rules
>>> (less than 300), but
>>> >>> I am having a problem with running out of Java heap
>>> space (set at ~400
>>> >>> MB).
>>> >>>
>>> >>> Through investigation, I came to the conclusion that
>>> this is due to
>>> >>> the design of the rules, rather than the number of
>>> facts. The engine
>>> >>> uses less memory inserting many facts that use
>>> simple rules, compared
>>> >>> with inserting few facts that use many rules.
>>> >>>
>>> >>> Can anybody suggest some tips for reducing heap
>>> memory usage in
>>> >>> Drools? I don't have a time constraint, only a
>>> heap/memory constraint.
>>> >>> A sample rule in my project looks like this:
>>> >>>
>>> >>> rule "Create QueryParameter for target container
>>> of DetailWire"
>>> >>> when
>>> >>> container : Frame( )
>>> >>> schema : DomainSchema ( )
>>> >>> domainSource : DomainSource ( )
>>> >>> instance : DomainIterator( )
>>> >>> selectEdge : SelectEdge ( eval (
>>> >>> functions.connectsSelect(selectEdge, instance,
>>> domainSource )) )
>>> >>> schemaEdge : SchemaEdge ( eval (
>>> >>> functions.connectsSchema(schemaEdge, domainSource,
>>> schema )) )
>>> >>> source : VisibleThing ( eContainer == container )
>>> >>> target : Frame ( )
>>> >>> instanceSet : SetWire (
>>> eval(functions.connectsSet(instanceSet,
>>> >>> instance, source )) )
>>> >>> detail : DetailWire ( )
>>> >>> eval ( functions.connectsDetail(detail,
>>> source, target ))
>>> >>> pk : DomainAttribute ( eContainer == schema,
>>> primaryKey == true )
>>> >>> not ( queryPk : QueryParameter ( eContainer
>>> == target, name == pk.name <http://pk.name> ) )
>>> >>> eval ( handler.veto( detail ))
>>> >>>
>>> >>> then
>>> >>> QueryParameter qp =
>>> handler.generatedQueryParameter(detail, target);
>>> >>> handler.setName(qp, pk.getName());
>>> >>> queue.add(qp, drools); // wraps insert(...)
>>> >>>
>>> >>> end
>>> >>>
>>> >>> I try to order the select statements in an order
>>> that will reduce the
>>> >>> size of the cross-product (in theory), but I also
>>> try and keep the
>>> >>> rules fairly human readable. I try to avoid
>>> comparison operators like
>>> >>> < and>. Analysing a heap dump shows that most of
>>> the memory is being
>>> >>> used in StatefulSession.nodeMemories> PrimitiveLongMap.
>>> >>>
>>> >>> I am using a StatefulSession; if I understand
>>> correctly, I can't use a
>>> >>> StatelessSession with sequential mode since I am
>>> inserting facts as
>>> >>> part of the rules. If I also understand correctly,
>>> I'd like the Rete
>>> >>> graph to be tall, rather than wide.
>>> >>>
>>> >>> Some ideas I have thought of include the following:
>>> >>> 1. Creating a separate intermediary meta-model to
>>> split up the sizes
>>> >>> of the rules. e.g. instead of (if A and B and C then
>>> insert D), using
>>> >>> (if A and B then insert E; if E and C then insert D).
>>> >>> 2. Moving eval() statements directly into the
>>> Type(...) selectors.
>>> >>> 3. Removing eval() statements. Would this allow for
>>> better indexing by
>>> >>> the Rete algorithm?
>>> >>> 4. Reducing the height, or the width, of the class
>>> hierarchy of the
>>> >>> facts. e.g. Removing interfaces or abstract classes
>>> to reduce the
>>> >>> possible matches. Would this make a difference?
>>> >>> 5. Conversely, increasing the height, or the width,
>>> of the class
>>> >>> hierarchy. e.g. Adding interfaces or abstract
>>> classes to reduce field
>>> >>> accessors.
>>> >>> 6. Instead of using EObject.eContainer, creating an
>>> explicit
>>> >>> containment property in all of my EObjects.
>>> >>> 7. Creating a DSL that is human-readable, but allows
>>> for the
>>> >>> automation of some of these approaches.
>>> >>> 8. Moving all rules into one rule file, or splitting
>>> up rules into
>>> >>> smaller files.
>>> >>>
>>> >>> Is there kind of profiler for Drools that will let
>>> me see the size (or
>>> >>> the memory usage) of particular rules, or of the
>>> memory used after
>>> >>> inference? Ideally I'd use this to profile any changes.
>>> >>>
>>> >>> Thanks for any thoughts or tips! :-)
>>> >>>
>>> >>> Jevon
>>> >>> _______________________________________________
>>> >>> rules-users mailing list
>>> >>> rules-users at lists.jboss.org
>>> <mailto:rules-users at lists.jboss.org>
>>> >>> https://lists.jboss.org/mailman/listinfo/rules-users
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> rules-users mailing list
>>> >> rules-users at lists.jboss.org
>>> <mailto:rules-users at lists.jboss.org>
>>> >> https://lists.jboss.org/mailman/listinfo/rules-users
>>> >>
>>> >
>>> > _______________________________________________
>>> > rules-users mailing list
>>> > rules-users at lists.jboss.org
>>> <mailto:rules-users at lists.jboss.org>
>>> > https://lists.jboss.org/mailman/listinfo/rules-users
>>> >
>>>
>>>
>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20100722/d3f1bef0/attachment.html
More information about the rules-users
mailing list