[rules-users] 14GB of NotNodeLeftTuples produced by one rule?

Wolfgang Laun wolfgang.laun at gmail.com
Mon Jan 7 11:15:39 EST 2013


On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>>
>> The system will create network nodes even when only one pattern matches.
>> 150,000/50,000 = 3 exactly, or average?
>
>  3 exactly.

I was running your rule with this data using Drools 5.2/3/4/5.0:

	WorkingMemoryEntryPoint ep =
            kSession.getWorkingMemoryEntryPoint("internalstream");
        Event e;
        Random rand = new Random();
        for( int id = 1; id <= 100000; id++ ){
            int iName = rand.nextInt( 1000 ) + 1;
	    e = new Event( "a", "Joe" + iName );
	    e.put( "id", "id"+id );
	    ep.insert( e );
	    e = new Event( "a", "Jack" + iName );
	    e.put( "id", "id"+id );
	    ep.insert( e );
	    e = new Event( "a", "Jill" + iName );
	    e.put( "id", "id"+id );
	    ep.insert( e );
	}
As you can see, I insert 300,000 facts, and there wasn't a problem
with memory, so I didn't check the node count. Are there any other
rules in the rule base?

As you have written: you'll get 50,000 x 6 = 300,000 fact reference
pairs to represent matching EventA pairs. Indeed, I'd expect the
NotNodeLeftTuples to have the same order of magnitude.

>> Is the distribution of id/user combinations realistic?
>
> What do you mean by realistic?

Is this as it will be in production?

>
>
>> What else do
>> you need to do with Event type "a"? Similar? Completely different? -
>> There would be a simple solution to significantly reduce the memory
>> requirements, but it may not be feasible due to these answers.
>
> At the moment we are just designing a generic solution, which might be
> extended by rules afterwards, so that "old" events might need to be reused.
> In a real environment, of course, we would retract some events not needed
> any longer. But for now we are doing some performance testing and were
> surprised that we could "crash" the system with one single rule. Of course,
> with a lot of events ;-)

You can crash with even simpler rules :-)

-W

>
> 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
>
>> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>> > Thanks for the input. For 150,000 type "a" events we had about 50,000
>> > different ids and 1,000 user values.
>> > After all, combinations possible for type "b" were only 1,000,000
>> > (1,000
>> > users * 1,000 users), which is why I am surprised to have 88 million
>> > instances.
>>
>> The system will create network nodes even when only one pattern matches.
>> 150,000/50,000 = 3 exactly, or average?
>>
>> If you have 3 events A, B, C with identical ids and different users,
>> you'll get the following candidates for an activation: (A,B), (B,A),
>> (A,C), (C,A), (B,C), (C,B)
>> and this increases O(n^2). - Since you know the exact distribution of
>> your data, you might compute this precisely.
>>
>> Is the distribution of id/user combinations realistic? What else do
>> you need to do with Event type "a"? Similar? Completely different? -
>> There would be a simple solution to significantly reduce the memory
>> requirements, but it may not be feasible due to these answers.
>>
>> >
>> > Yes, it is intentional to have the rule fire twice for each combination
>> :-)
>> > Unfortunately, retracting events is not an option right now.
>>
>> Then, at least, generate both in a single rule.
>>
>> >
>> > I started another round, where I ensured to insert a lot more "b"
>> > events:
>> > The memory used by NotNodeLeftTuples is a lot less, even though these
>> nodes
>> > still use most of the memory.
>> > Concluding from all that, I guess it is possible that the nodes take
>> > that
>> > much space (up to many GB), and the more events are inserted which
>> > invalidate the NOT nodes, the less memory is used by them?
>>
>> Well, you don't need the NOT node, and their number depends on the
>> distribution of your data.
>>
>> -W
>>
>> >
>> > 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
>> >
>> >> The amount of memory required for 150K type "a" depends on the actual
>> >> distribution of this data w.r.t. fields id and user, and other
>> >> circumstances; it is not only the rule that is to blame.
>> >>
>> >> There is one flaw, though: The rule would fire twice for a matching
>> >> pair of events of type "a". It's possible that you do want to have a
>> >> type "b" for both combinations of user and friendid, but you could
>> >> create both in a single rule, which should halve your memory
>> >> requirements. If there is no ordered attribute, use the timestamp to
>> >> restrict a pair to only one combination (hint: "after").
>> >>
>> >> This will still generate a lot of network nodes.
>> >>
>> >> Other ideas for reduction may have to take the entire application
>> >> scenario into account, e.g., can you retract events after they have
>> >> been paired, or how do you do inserts and calls to fireAllRules, etc.
>> >> Most importantly, however, is the actual frequency of id and user
>> >> values in relation to type "a" events.
>> >>
>> >> -W
>> >>
>> >>
>> >>
>> >> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > we observe a strange behavior with one of our rules. After
>> >> > deployment
>> >> > and sending lots of events (~150,000 of type "a"), the server slows
>> >> > down
>> >> > rapidly until it runs out of memory.
>> >> > We checked with VisualVM which objects are filling the memory: In
>> >> > one
>> >> > moment there were almost 14GB of NotNodeLeftTuples (88,933,186
>> >> Instances)!
>> >> >
>> >> > This is our rule:
>> >> >
>> >> > rule "example"
>> >> > when
>> >> > $evt1:EventObject(type=='a', $id:data['id'], $user:user) from
>> >> > entry-point
>> >> > internalstream
>> >> > $evt2:EventObject(type=='a', data['id']==$id, user!=$user,
>> $user2:user)
>> >> > from entry-point internalstream
>> >> > not(EventObject(type=='b', user==$user, data['friendid']==$user2)
>> >> > from
>> >> > entry-point internalstream)
>> >> > then
>> >> > EventObject evt = new EventObject();
>> >> > evt.setType('b');
>> >> > evt.setUser($evt1.getUser());
>> >> > evt.put('friendid', $evt2.getUser());
>> >> > entryPoints['internalstream'].insert(evt);
>> >> > end
>> >> >
>> >> > Is that behavior correct for such a size of event combinations when
>> >> using a
>> >> > NOT in the rule?
>> >> >
>> >> > Thanks,
>> >> > Svenja
>> >> >
>> >> _______________________________________________
>> >> rules-users mailing list
>> >> rules-users at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/rules-users
>> >>
>> >
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>


More information about the rules-users mailing list