As you can see, I insert 300,000 facts, and there wasn't a problem
with memory, so I didn't check the node count. Are there any other
rules in the rule base?
Thanks! We have six other rules in the rule base, but none of them has a NOT in it. So I hope all tuples belong to that rule, and as I tested without that specific rule there were no NotNodeLeftTuples.
In the case where the 14GB of tupels were produced our distribution actually was not randomly distributed, but still with a random distribution we have a lot of NOT node tuples, e.g., 6 million NOT node tuples for 128*128=16,384 possible nodes.

Indeed, I'd expect the
NotNodeLeftTuples to have the same order of magnitude.
Can you explain when a NotNodeLeftTuple is created exactly, and when it is removed/not created? Unfortunately, I did not find any information or documentation on this. We definitely want to understand in which case so many tuples are created to be able to avoid this behavior or construct rules in other ways.

2013/1/7 Wolfgang Laun <wolfgang.laun@gmail.com>
On 07/01/2013, Svenja Brunstein <svenja.brunstein@gmail.com> wrote:
>>
>> The system will create network nodes even when only one pattern matches.
>> 150,000/50,000 = 3 exactly, or average?
>
>  3 exactly.

I was running your rule with this data using Drools 5.2/3/4/5.0:

        WorkingMemoryEntryPoint ep =
            kSession.getWorkingMemoryEntryPoint("internalstream");
        Event e;
        Random rand = new Random();
        for( int id = 1; id <= 100000; id++ ){
            int iName = rand.nextInt( 1000 ) + 1;
            e = new Event( "a", "Joe" + iName );
            e.put( "id", "id"+id );
            ep.insert( e );
            e = new Event( "a", "Jack" + iName );
            e.put( "id", "id"+id );
            ep.insert( e );
            e = new Event( "a", "Jill" + iName );
            e.put( "id", "id"+id );
            ep.insert( e );
        }
As you can see, I insert 300,000 facts, and there wasn't a problem
with memory, so I didn't check the node count. Are there any other
rules in the rule base?

As you have written: you'll get 50,000 x 6 = 300,000 fact reference
pairs to represent matching EventA pairs. Indeed, I'd expect the
NotNodeLeftTuples to have the same order of magnitude.

>> Is the distribution of id/user combinations realistic?
>
> What do you mean by realistic?

Is this as it will be in production?

>
>
>> What else do
>> you need to do with Event type "a"? Similar? Completely different? -
>> There would be a simple solution to significantly reduce the memory
>> requirements, but it may not be feasible due to these answers.
>
> At the moment we are just designing a generic solution, which might be
> extended by rules afterwards, so that "old" events might need to be reused.
> In a real environment, of course, we would retract some events not needed
> any longer. But for now we are doing some performance testing and were
> surprised that we could "crash" the system with one single rule. Of course,
> with a lot of events ;-)

You can crash with even simpler rules :-)

-W

>
> 2013/1/7 Wolfgang Laun <wolfgang.laun@gmail.com>
>
>> On 07/01/2013, Svenja Brunstein <svenja.brunstein@gmail.com> wrote:
>> > Thanks for the input. For 150,000 type "a" events we had about 50,000
>> > different ids and 1,000 user values.
>> > After all, combinations possible for type "b" were only 1,000,000
>> > (1,000
>> > users * 1,000 users), which is why I am surprised to have 88 million
>> > instances.
>>
>> The system will create network nodes even when only one pattern matches.
>> 150,000/50,000 = 3 exactly, or average?
>>
>> If you have 3 events A, B, C with identical ids and different users,
>> you'll get the following candidates for an activation: (A,B), (B,A),
>> (A,C), (C,A), (B,C), (C,B)
>> and this increases O(n^2). - Since you know the exact distribution of
>> your data, you might compute this precisely.
>>
>> Is the distribution of id/user combinations realistic? What else do
>> you need to do with Event type "a"? Similar? Completely different? -
>> There would be a simple solution to significantly reduce the memory
>> requirements, but it may not be feasible due to these answers.
>>
>> >
>> > Yes, it is intentional to have the rule fire twice for each combination
>> :-)
>> > Unfortunately, retracting events is not an option right now.
>>
>> Then, at least, generate both in a single rule.
>>
>> >
>> > I started another round, where I ensured to insert a lot more "b"
>> > events:
>> > The memory used by NotNodeLeftTuples is a lot less, even though these
>> nodes
>> > still use most of the memory.
>> > Concluding from all that, I guess it is possible that the nodes take
>> > that
>> > much space (up to many GB), and the more events are inserted which
>> > invalidate the NOT nodes, the less memory is used by them?
>>
>> Well, you don't need the NOT node, and their number depends on the
>> distribution of your data.
>>
>> -W
>>
>> >
>> > 2013/1/7 Wolfgang Laun <wolfgang.laun@gmail.com>
>> >
>> >> The amount of memory required for 150K type "a" depends on the actual
>> >> distribution of this data w.r.t. fields id and user, and other
>> >> circumstances; it is not only the rule that is to blame.
>> >>
>> >> There is one flaw, though: The rule would fire twice for a matching
>> >> pair of events of type "a". It's possible that you do want to have a
>> >> type "b" for both combinations of user and friendid, but you could
>> >> create both in a single rule, which should halve your memory
>> >> requirements. If there is no ordered attribute, use the timestamp to
>> >> restrict a pair to only one combination (hint: "after").
>> >>
>> >> This will still generate a lot of network nodes.
>> >>
>> >> Other ideas for reduction may have to take the entire application
>> >> scenario into account, e.g., can you retract events after they have
>> >> been paired, or how do you do inserts and calls to fireAllRules, etc.
>> >> Most importantly, however, is the actual frequency of id and user
>> >> values in relation to type "a" events.
>> >>
>> >> -W
>> >>
>> >>
>> >>
>> >> On 07/01/2013, Svenja Brunstein <svenja.brunstein@gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > we observe a strange behavior with one of our rules. After
>> >> > deployment
>> >> > and sending lots of events (~150,000 of type "a"), the server slows
>> >> > down
>> >> > rapidly until it runs out of memory.
>> >> > We checked with VisualVM which objects are filling the memory: In
>> >> > one
>> >> > moment there were almost 14GB of NotNodeLeftTuples (88,933,186
>> >> Instances)!
>> >> >
>> >> > This is our rule:
>> >> >
>> >> > rule "example"
>> >> > when
>> >> > $evt1:EventObject(type=='a', $id:data['id'], $user:user) from
>> >> > entry-point
>> >> > internalstream
>> >> > $evt2:EventObject(type=='a', data['id']==$id, user!=$user,
>> $user2:user)
>> >> > from entry-point internalstream
>> >> > not(EventObject(type=='b', user==$user, data['friendid']==$user2)
>> >> > from
>> >> > entry-point internalstream)
>> >> > then
>> >> > EventObject evt = new EventObject();
>> >> > evt.setType('b');
>> >> > evt.setUser($evt1.getUser());
>> >> > evt.put('friendid', $evt2.getUser());
>> >> > entryPoints['internalstream'].insert(evt);
>> >> > end
>> >> >
>> >> > Is that behavior correct for such a size of event combinations when
>> >> using a
>> >> > NOT in the rule?
>> >> >
>> >> > Thanks,
>> >> > Svenja
>> >> >
>> >> _______________________________________________
>> >> rules-users mailing list
>> >> rules-users@lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/rules-users
>> >>
>> >
>> _______________________________________________
>> rules-users mailing list
>> rules-users@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>
_______________________________________________
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users