[rules-dev] Left and Right Unlinking - Community Project Proposal

Wed Oct 27 17:46:19 EDT 2010

Anyone made progress on this?

Mark
On 20/08/2010 17:02, Mark Proctor wrote:
> In an effort to help encourage those thinking of learning more about 
> the internals of rule engines. I have made a document on 
> implementating left and right unlinking. I describe the initial paper 
> in terms relevant to Drools users, and then how that can be 
> implemented in Drools and a series of enhancements over the original 
> paper. The task is actually surprisingly simple and you only need to 
> learn a very small part of the Drools implementation to do it, as such 
> it's a great getting started task. For really large stateful systems 
> of hundreds or even thousands of rules and hundreds of thousands of 
> facts it should save significant amounts of memory.
> http://blog.athico.com/2010/08/left-and-right-unlinking-community.html
>
> Any takers?
>
> Mark
>
> Introduction
> The following paper describes Left and Right unlinking enhancements 
> for Rete based networks:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.6246 
> <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.6246>
>
> A rete based rule engine consists of two parts of the network, the 
> alpha nodes and the beta nodes. When an object is first inserted into 
> the engine it is discriminated against by the object type Node, this 
> is a one input and one output node. From there it may be further 
> discriminated against by alpha nodes that constrain on literal values 
> before reaching the right input of a join node in the beta part of the 
> network. Join nodes have two inputs, left and right. The right input 
> receives propagations consisting of a single object from the alpha 
> part of the network. The left input receives propagations consisting 
> of 1 or more objects, from the parent beta node. We refer to these 
> propagating objects as LeftTuple and RightTuple, other engines also 
> use the terms tokens or partial matches. When a tuple propagation 
> reaches a left or right input it's stored in that inputs memory and it 
> attempts to join with all possible tuples on the opposite side. If 
> there are no tuples on the opposite side then no join can happen and 
> the tuple just waits in the node's memory until a propagation from the 
> opposite side attempts to join with it. If a given. It would be better 
> if the engine could avoid populating that node's memory until both 
> sides have tuples. Left and right unlinking are solutions to this problem.
>
> The paper proposes that a node can either be left unlinked or right 
> unlinked, but not both, as then the rule would be completely 
> disconnected from the network. Unlinking an input means that it will 
> not receive any propagations and that the node's memory for that input 
> is not populated, saving memory space. When the opposite side, which 
> is still linked, receives a propagation the unlinked side is linked 
> back in and receives all the none propagated tuples. As both sides 
> cannot be unlinked, the paper describes a simple heuristic for 
> choosing which side to unlink. Which ever side becomes empty first, 
> then unlink the other. It says that on start up just arbitrarily chose 
> to unlink one side as default. The initial hit from choosing the wrong 
> side will be negligible, as the heuristic corrects this after the 
> first set of propagations.
>
> If the left input becomes empty the right input is unlink, thus 
> clearing the right input's memory too. The moment the left input 
> receives a propagation it re-attaches the right input fully populating 
> it's memory. The node can then attempt joins as normal. Vice-versa if 
> the right input becomes empty it unlinks the left input. The moment 
> the right input receives a propagation it re-attaches the left input 
> fully populating it's memory so that the node can attempt to join as 
> normal.
>
> Implementing Left and Right Unlinking for shared Knowledge Bases
>
> The description of unlinking in the paper won't work for Drools or for 
> other rule engines that share the knowledge base between multiple 
> sessions. In Drools the session data is decoupled from the main 
> knowledge base and multiple sessions can share the same knowledge 
> base. The paper above describes systems where the session data is 
> tightly coupled to the knowledge base and the knowledge base has only 
> a single session. In shared systems a node input that is empty for one 
> session might not be empty for another. Instead of physically 
> unlinking the nodes, as described in the paper, an integer value can 
> be used on the session's node memory that indicates if the node is 
> unlinked for left, right or both inputs. When the propagating node 
> attempts to propagate instead of just creating a left or right tuple 
> and pushing it into the node. It'll first retrieve the node's memory 
> and only create the tuple and propagate if it's linked.
>
> This is great as it also avoids creating tuple objects that would just 
> be discarded afterwards as there would be nothing to join with, making 
> things lighter on the GC. However it means the engine looks up the 
> node memory twice, once before propagating to the node and also inside 
> of the node as it attempt to do joins. Instead the node memory should 
> be looked up once, prior to propagating and then passed as an 
> argument, avoiding the double lookup.
>
> Traditional Rete has memory per alpha node, for each literal 
> constraint, in the network. Drools does not have alpha memory, instead 
> facts are pulled from the object type node. This means that facts may 
> needlessly evaluate in the alpha part of the network, only to be 
> refused addition to the node memory afterwards. Rete supports 
> something called "node sharing", where multiple rules with similar 
> constructs use the same nodes in the network. For this reason shared 
> nodes cannot easily be unlinked. As a compromise when the alpha node 
> is no longer shared, the network can do a node memory lookup, prior to 
> doing the evaluation and check if that section of the network is 
> unlinked and avoid attempting the evaluation if it is. This allows for 
> left and right unlinking to be used in a engine such as Drools.
>
> Using Left and Right Unlinking at the Same Time
>
> The original paper describes an implantation in which a node cannot 
> have both the left and right inputs unlinked for the same node. 
> Building on the extension above to allow unlinking to work with a 
> shared knowledge base the initial linking status value can be set to 
> both left and right being unlinked. However in this initial state, 
> where both sides are unlinked, the leaf node's right input isn't just 
> waiting for a left propagation so the right can re-link itself (which 
> it can't as the left is unlinked too). It's also waiting to receive 
> it's first propagation, when it does it will link the left input back 
> in. This will then tell it's parent node's right input to also do the 
> same, i.e. wait for it's first right input propagation and link in the 
> left when it happens. If it already has a right propagation it'll just 
> link in the left anyway. This will trickle up until the root is 
> finally linked in and propagations can happen as normally, and the 
> rule's nodes return to the above heuristics for when to link and 
> unlink the nodes.
>
> Avoid Unnecessary Eager Propagations
>
> A rule always eagerly propagates all joins, regardless of whether the 
> child node can undertake joins too, for instance of there is no 
> propagates for the leaf node then no rules can fire, and the eager 
> propagations are wasted work. Unlinking can be extended to try to 
> prevent some level of eager propagations. Should the leaf node become 
> right unlinked and that right input also become empty it will unlink 
> the left too (so both sides are unlinked) and go back to waiting for 
> the first right propagation, at which point it'll re-link the left. If 
> the parent node also has it's right input unlinked at the point that 
> it's child node unlinks the left it will do this too. It will repeat 
> this up the chain until it reaches a node that has both left and right 
> linked in. This stops any further eager matching from occurring that 
> we know can't result in an activation until the leaf node has at least 
> one right input.
>
> Heuristics to Avoid Churn from Excessive and Unnecessary Unlinking
>
> The only case where left and right linking would be a bad idea is in 
> situations that would cause a "churn". Churn is when a node with have 
> a large amount of right input memory is continually caused to be 
> linked in and linked out, forcing those nodes to be repeatedly 
> populated which causes a slow down. However heuristics can be used 
> here too, to avoid unnecessary unlinking. The first time an input 
> becomes empty unlink the opposite and store a time stamp (integer 
> counter for fact handles from the WM). Then have a minimum delta 
> number, say 100. The next time it attempts to unlink, calculate the 
> delta of the current time stamp (integer counter on fact handle) and 
> the time stamp of the node which last unlinked (which was recorded at 
> the point of unlinking) if it's less than 100 then do nothing and 
> don't unlink until it's 100 or more. If it's 100 or more then unlink 
> and as well as storing the unlink time stamp, then take the delta of 
> 100 or more and apply a multiple (2, 3, 4 etc depending on how steep 
> you want it to rise, 3 is a good starting number) and store it. Such 
> as if the delta is 100 then store 300. The next time the node links 
> and attempts to unlink it must be a delta of 300 or more, the time 
> after that 900 the time after that 2700.
>
>
> _______________________________________________
> rules-dev mailing list
> rules-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-dev/attachments/20101027/f0f584ca/attachment-0001.html