[rules-dev] Fwd: [rules-users] 14GB of NotNodeLeftTuples produced by one rule?
Wolfgang Laun
wolfgang.laun at gmail.com
Fri Jan 11 09:28:34 EST 2013
Any takers?
-W
---------- Forwarded message ----------
From: Svenja Brunstein <svenja.brunstein at gmail.com>
Date: Fri, 11 Jan 2013 15:00:07 +0100
Subject: Re: [rules-users] 14GB of NotNodeLeftTuples produced by one rule?
To: Rules Users List <rules-users at lists.jboss.org>
>
> As you can see, I insert 300,000 facts, and there wasn't a problem
> with memory, so I didn't check the node count. Are there any other
> rules in the rule base?
Thanks! We have six other rules in the rule base, but none of them has a
NOT in it. So I hope all tuples belong to that rule, and as I tested
without that specific rule there were no NotNodeLeftTuples.
In the case where the 14GB of tupels were produced our distribution
actually was *not *randomly distributed, but still with a random
distribution we have a lot of NOT node tuples, e.g., 6 million NOT node
tuples for 128*128=16,384 possible nodes.
Indeed, I'd expect the
> NotNodeLeftTuples to have the same order of magnitude.
Can you explain when a NotNodeLeftTuple is created exactly, and when it is
removed/not created? Unfortunately, I did not find any information or
documentation on this. We definitely want to understand in which case so
many tuples are created to be able to avoid this behavior or construct
rules in other ways.
2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
> >>
> >> The system will create network nodes even when only one pattern matches.
> >> 150,000/50,000 = 3 exactly, or average?
> >
> > 3 exactly.
>
> I was running your rule with this data using Drools 5.2/3/4/5.0:
>
> WorkingMemoryEntryPoint ep =
> kSession.getWorkingMemoryEntryPoint("internalstream");
> Event e;
> Random rand = new Random();
> for( int id = 1; id <= 100000; id++ ){
> int iName = rand.nextInt( 1000 ) + 1;
> e = new Event( "a", "Joe" + iName );
> e.put( "id", "id"+id );
> ep.insert( e );
> e = new Event( "a", "Jack" + iName );
> e.put( "id", "id"+id );
> ep.insert( e );
> e = new Event( "a", "Jill" + iName );
> e.put( "id", "id"+id );
> ep.insert( e );
> }
> As you can see, I insert 300,000 facts, and there wasn't a problem
> with memory, so I didn't check the node count. Are there any other
> rules in the rule base?
>
> As you have written: you'll get 50,000 x 6 = 300,000 fact reference
> pairs to represent matching EventA pairs. Indeed, I'd expect the
> NotNodeLeftTuples to have the same order of magnitude.
>
> >> Is the distribution of id/user combinations realistic?
> >
> > What do you mean by realistic?
>
> Is this as it will be in production?
>
> >
> >
> >> What else do
> >> you need to do with Event type "a"? Similar? Completely different? -
> >> There would be a simple solution to significantly reduce the memory
> >> requirements, but it may not be feasible due to these answers.
> >
> > At the moment we are just designing a generic solution, which might be
> > extended by rules afterwards, so that "old" events might need to be
> reused.
> > In a real environment, of course, we would retract some events not needed
> > any longer. But for now we are doing some performance testing and were
> > surprised that we could "crash" the system with one single rule. Of
> course,
> > with a lot of events ;-)
>
> You can crash with even simpler rules :-)
>
> -W
>
> >
> > 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
> >
> >> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
> >> > Thanks for the input. For 150,000 type "a" events we had about 50,000
> >> > different ids and 1,000 user values.
> >> > After all, combinations possible for type "b" were only 1,000,000
> >> > (1,000
> >> > users * 1,000 users), which is why I am surprised to have 88 million
> >> > instances.
> >>
> >> The system will create network nodes even when only one pattern matches.
> >> 150,000/50,000 = 3 exactly, or average?
> >>
> >> If you have 3 events A, B, C with identical ids and different users,
> >> you'll get the following candidates for an activation: (A,B), (B,A),
> >> (A,C), (C,A), (B,C), (C,B)
> >> and this increases O(n^2). - Since you know the exact distribution of
> >> your data, you might compute this precisely.
> >>
> >> Is the distribution of id/user combinations realistic? What else do
> >> you need to do with Event type "a"? Similar? Completely different? -
> >> There would be a simple solution to significantly reduce the memory
> >> requirements, but it may not be feasible due to these answers.
> >>
> >> >
> >> > Yes, it is intentional to have the rule fire twice for each
> combination
> >> :-)
> >> > Unfortunately, retracting events is not an option right now.
> >>
> >> Then, at least, generate both in a single rule.
> >>
> >> >
> >> > I started another round, where I ensured to insert a lot more "b"
> >> > events:
> >> > The memory used by NotNodeLeftTuples is a lot less, even though these
> >> nodes
> >> > still use most of the memory.
> >> > Concluding from all that, I guess it is possible that the nodes take
> >> > that
> >> > much space (up to many GB), and the more events are inserted which
> >> > invalidate the NOT nodes, the less memory is used by them?
> >>
> >> Well, you don't need the NOT node, and their number depends on the
> >> distribution of your data.
> >>
> >> -W
> >>
> >> >
> >> > 2013/1/7 Wolfgang Laun <wolfgang.laun at gmail.com>
> >> >
> >> >> The amount of memory required for 150K type "a" depends on the actual
> >> >> distribution of this data w.r.t. fields id and user, and other
> >> >> circumstances; it is not only the rule that is to blame.
> >> >>
> >> >> There is one flaw, though: The rule would fire twice for a matching
> >> >> pair of events of type "a". It's possible that you do want to have a
> >> >> type "b" for both combinations of user and friendid, but you could
> >> >> create both in a single rule, which should halve your memory
> >> >> requirements. If there is no ordered attribute, use the timestamp to
> >> >> restrict a pair to only one combination (hint: "after").
> >> >>
> >> >> This will still generate a lot of network nodes.
> >> >>
> >> >> Other ideas for reduction may have to take the entire application
> >> >> scenario into account, e.g., can you retract events after they have
> >> >> been paired, or how do you do inserts and calls to fireAllRules, etc.
> >> >> Most importantly, however, is the actual frequency of id and user
> >> >> values in relation to type "a" events.
> >> >>
> >> >> -W
> >> >>
> >> >>
> >> >>
> >> >> On 07/01/2013, Svenja Brunstein <svenja.brunstein at gmail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > we observe a strange behavior with one of our rules. After
> >> >> > deployment
> >> >> > and sending lots of events (~150,000 of type "a"), the server slows
> >> >> > down
> >> >> > rapidly until it runs out of memory.
> >> >> > We checked with VisualVM which objects are filling the memory: In
> >> >> > one
> >> >> > moment there were almost 14GB of NotNodeLeftTuples (88,933,186
> >> >> Instances)!
> >> >> >
> >> >> > This is our rule:
> >> >> >
> >> >> > rule "example"
> >> >> > when
> >> >> > $evt1:EventObject(type=='a', $id:data['id'], $user:user) from
> >> >> > entry-point
> >> >> > internalstream
> >> >> > $evt2:EventObject(type=='a', data['id']==$id, user!=$user,
> >> $user2:user)
> >> >> > from entry-point internalstream
> >> >> > not(EventObject(type=='b', user==$user, data['friendid']==$user2)
> >> >> > from
> >> >> > entry-point internalstream)
> >> >> > then
> >> >> > EventObject evt = new EventObject();
> >> >> > evt.setType('b');
> >> >> > evt.setUser($evt1.getUser());
> >> >> > evt.put('friendid', $evt2.getUser());
> >> >> > entryPoints['internalstream'].insert(evt);
> >> >> > end
> >> >> >
> >> >> > Is that behavior correct for such a size of event combinations when
> >> >> using a
> >> >> > NOT in the rule?
> >> >> >
> >> >> > Thanks,
> >> >> > Svenja
> >> >> >
> >> >> _______________________________________________
> >> >> rules-users mailing list
> >> >> rules-users at lists.jboss.org
> >> >> https://lists.jboss.org/mailman/listinfo/rules-users
> >> >>
> >> >
> >> _______________________________________________
> >> rules-users mailing list
> >> rules-users at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/rules-users
> >>
> >
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>
More information about the rules-dev
mailing list