[rules-dev] [rules-users] 14GB of NotNodeLeftTuples produced by one rule?

Mark Proctor mproctor at codehaus.org
Fri Jan 11 10:23:47 EST 2013


you need to create a unit test that shows this happening, so people can run it.
http://docs.jboss.org/drools/release/5.4.0.Final/droolsjbpm-introduction-docs/html/gettingstarted.html

Mark
On 11 Jan 2013, at 14:28, Wolfgang Laun <wolfgang.laun at gmail.com> wrote:

> Any takers?
> 
> -W
> 
> ---------- Forwarded message ----------
> From: Svenja Brunstein <svenja.brunstein at gmail.com>
> Date: Fri, 11 Jan 2013 15:00:07 +0100
> Subject: Re: [rules-users] 14GB of NotNodeLeftTuples produced by one rule?
> To: Rules Users List <rules-users at lists.jboss.org>
> 
>> 
>> As you can see, I insert 300,000 facts, and there wasn't a problem
>> with memory, so I didn't check the node count. Are there any other
>> rules in the rule base?
> 
> Thanks! We have six other rules in the rule base, but none of them has a
> NOT in it. So I hope all tuples belong to that rule, and as I tested
> without that specific rule there were no NotNodeLeftTuples.
> In the case where the 14GB of tupels were produced our distribution
> actually was *not *randomly distributed, but still with a random
> distribution we have a lot of NOT node tuples, e.g., 6 million NOT node
> tuples for 128*128=16,384 possible nodes.
> 
> Indeed, I'd expect the
>> NotNodeLeftTuples to have the same order of magnitude.
> 
> Can you explain when a NotNodeLeftTuple is created exactly, and when it is
> removed/not created? Unfortunately, I did not find any information or
> documentation on this. We definitely want to understand in which case so
> many tuples are created to be able to avoid this behavior or construct
> rules in other ways.
> 
> 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
> 
>> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>>>> 
>>>> The system will create network nodes even when only one pattern matches.
>>>> 150,000/50,000 = 3 exactly, or average?
>>> 
>>> 3 exactly.
>> 
>> I was running your rule with this data using Drools 5.2/3/4/5.0:
>> 
>>        WorkingMemoryEntryPoint ep =
>>            kSession.getWorkingMemoryEntryPoint("internalstream");
>>        Event e;
>>        Random rand = new Random();
>>        for( int id = 1; id <= 100000; id++ ){
>>            int iName = rand.nextInt( 1000 ) + 1;
>>            e = new Event( "a", "Joe" + iName );
>>            e.put( "id", "id"+id );
>>            ep.insert( e );
>>            e = new Event( "a", "Jack" + iName );
>>            e.put( "id", "id"+id );
>>            ep.insert( e );
>>            e = new Event( "a", "Jill" + iName );
>>            e.put( "id", "id"+id );
>>            ep.insert( e );
>>        }
>> As you can see, I insert 300,000 facts, and there wasn't a problem
>> with memory, so I didn't check the node count. Are there any other
>> rules in the rule base?
>> 
>> As you have written: you'll get 50,000 x 6 = 300,000 fact reference
>> pairs to represent matching EventA pairs. Indeed, I'd expect the
>> NotNodeLeftTuples to have the same order of magnitude.
>> 
>>>> Is the distribution of id/user combinations realistic?
>>> 
>>> What do you mean by realistic?
>> 
>> Is this as it will be in production?
>> 
>>> 
>>> 
>>>> What else do
>>>> you need to do with Event type "a"? Similar? Completely different? -
>>>> There would be a simple solution to significantly reduce the memory
>>>> requirements, but it may not be feasible due to these answers.
>>> 
>>> At the moment we are just designing a generic solution, which might be
>>> extended by rules afterwards, so that "old" events might need to be
>> reused.
>>> In a real environment, of course, we would retract some events not needed
>>> any longer. But for now we are doing some performance testing and were
>>> surprised that we could "crash" the system with one single rule. Of
>> course,
>>> with a lot of events ;-)
>> 
>> You can crash with even simpler rules :-)
>> 
>> -W
>> 
>>> 
>>> 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
>>> 
>>>> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>>>>> Thanks for the input. For 150,000 type "a" events we had about 50,000
>>>>> different ids and 1,000 user values.
>>>>> After all, combinations possible for type "b" were only 1,000,000
>>>>> (1,000
>>>>> users * 1,000 users), which is why I am surprised to have 88 million
>>>>> instances.
>>>> 
>>>> The system will create network nodes even when only one pattern matches.
>>>> 150,000/50,000 = 3 exactly, or average?
>>>> 
>>>> If you have 3 events A, B, C with identical ids and different users,
>>>> you'll get the following candidates for an activation: (A,B), (B,A),
>>>> (A,C), (C,A), (B,C), (C,B)
>>>> and this increases O(n^2). - Since you know the exact distribution of
>>>> your data, you might compute this precisely.
>>>> 
>>>> Is the distribution of id/user combinations realistic? What else do
>>>> you need to do with Event type "a"? Similar? Completely different? -
>>>> There would be a simple solution to significantly reduce the memory
>>>> requirements, but it may not be feasible due to these answers.
>>>> 
>>>>> 
>>>>> Yes, it is intentional to have the rule fire twice for each
>> combination
>>>> :-)
>>>>> Unfortunately, retracting events is not an option right now.
>>>> 
>>>> Then, at least, generate both in a single rule.
>>>> 
>>>>> 
>>>>> I started another round, where I ensured to insert a lot more "b"
>>>>> events:
>>>>> The memory used by NotNodeLeftTuples is a lot less, even though these
>>>> nodes
>>>>> still use most of the memory.
>>>>> Concluding from all that, I guess it is possible that the nodes take
>>>>> that
>>>>> much space (up to many GB), and the more events are inserted which
>>>>> invalidate the NOT nodes, the less memory is used by them?
>>>> 
>>>> Well, you don't need the NOT node, and their number depends on the
>>>> distribution of your data.
>>>> 
>>>> -W
>>>> 
>>>>> 
>>>>> 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
>>>>> 
>>>>>> The amount of memory required for 150K type "a" depends on the actual
>>>>>> distribution of this data w.r.t. fields id and user, and other
>>>>>> circumstances; it is not only the rule that is to blame.
>>>>>> 
>>>>>> There is one flaw, though: The rule would fire twice for a matching
>>>>>> pair of events of type "a". It's possible that you do want to have a
>>>>>> type "b" for both combinations of user and friendid, but you could
>>>>>> create both in a single rule, which should halve your memory
>>>>>> requirements. If there is no ordered attribute, use the timestamp to
>>>>>> restrict a pair to only one combination (hint: "after").
>>>>>> 
>>>>>> This will still generate a lot of network nodes.
>>>>>> 
>>>>>> Other ideas for reduction may have to take the entire application
>>>>>> scenario into account, e.g., can you retract events after they have
>>>>>> been paired, or how do you do inserts and calls to fireAllRules, etc.
>>>>>> Most importantly, however, is the actual frequency of id and user
>>>>>> values in relation to type "a" events.
>>>>>> 
>>>>>> -W
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> we observe a strange behavior with one of our rules. After
>>>>>>> deployment
>>>>>>> and sending lots of events (~150,000 of type "a"), the server slows
>>>>>>> down
>>>>>>> rapidly until it runs out of memory.
>>>>>>> We checked with VisualVM which objects are filling the memory: In
>>>>>>> one
>>>>>>> moment there were almost 14GB of NotNodeLeftTuples (88,933,186
>>>>>> Instances)!
>>>>>>> 
>>>>>>> This is our rule:
>>>>>>> 
>>>>>>> rule "example"
>>>>>>> when
>>>>>>> $evt1:EventObject(type=='a', $id:data['id'], $user:user) from
>>>>>>> entry-point
>>>>>>> internalstream
>>>>>>> $evt2:EventObject(type=='a', data['id']==$id, user!=$user,
>>>> $user2:user)
>>>>>>> from entry-point internalstream
>>>>>>> not(EventObject(type=='b', user==$user, data['friendid']==$user2)
>>>>>>> from
>>>>>>> entry-point internalstream)
>>>>>>> then
>>>>>>> EventObject evt = new EventObject();
>>>>>>> evt.setType('b');
>>>>>>> evt.setUser($evt1.getUser());
>>>>>>> evt.put('friendid', $evt2.getUser());
>>>>>>> entryPoints['internalstream'].insert(evt);
>>>>>>> end
>>>>>>> 
>>>>>>> Is that behavior correct for such a size of event combinations when
>>>>>> using a
>>>>>>> NOT in the rule?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Svenja
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> rules-users mailing list
>>>>>> rules-users at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> rules-users mailing list
>>>> rules-users at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>> 
>>> 
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>> 
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev




More information about the rules-dev mailing list