Left and Right Unlinking - Community Project Proposal - rules-dev

Friday, 20 August 2010

  In an effort to help encourage those thinking of learning more about 
the internals of rule engines. I have made a document on implementating 
left and right unlinking. I describe the initial paper in terms relevant 
to Drools users, and then how that can be implemented in Drools and a 
series of enhancements over the original paper. The task is actually 
surprisingly simple and you only need to learn a very small part of the 
Drools implementation to do it, as such it's a great getting started 
task. For really large stateful systems of hundreds or even thousands of 
rules and hundreds of thousands of facts it should save significant 
amounts of memory.
http://blog.athico.com/2010/08/left-and-right-unlinking-community.html

Any takers?

Mark

Introduction
The following paper describes Left and Right unlinking enhancements for 
Rete based networks:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.6246 
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.6246>

A rete based rule engine consists of two parts of the network, the alpha 
nodes and the beta nodes. When an object is first inserted into the 
engine it is discriminated against by the object type Node, this is a 
one input and one output node. From there it may be further 
discriminated against by alpha nodes that constrain on literal values 
before reaching the right input of a join node in the beta part of the 
network. Join nodes have two inputs, left and right. The right input 
receives propagations consisting of a single object from the alpha part 
of the network. The left input receives propagations consisting of 1 or 
more objects, from the parent beta node. We refer to these propagating 
objects as LeftTuple and RightTuple, other engines also use the terms 
tokens or partial matches. When a tuple propagation reaches a left or 
right input it's stored in that inputs memory and it attempts to join 
with all possible tuples on the opposite side. If there are no tuples on 
the opposite side then no join can happen and the tuple just waits in 
the node's memory until a propagation from the opposite side attempts to 
join with it. If a given. It would be better if the engine could avoid 
populating that node's memory until both sides have tuples. Left and 
right unlinking are solutions to this problem.

The paper proposes that a node can either be left unlinked or right 
unlinked, but not both, as then the rule would be completely 
disconnected from the network. Unlinking an input means that it will not 
receive any propagations and that the node's memory for that input is 
not populated, saving memory space. When the opposite side, which is 
still linked, receives a propagation the unlinked side is linked back in 
and receives all the none propagated tuples. As both sides cannot be 
unlinked, the paper describes a simple heuristic for choosing which side 
to unlink. Which ever side becomes empty first, then unlink the other. 
It says that on start up just arbitrarily chose to unlink one side as 
default. The initial hit from choosing the wrong side will be 
negligible, as the heuristic corrects this after the first set of 
propagations.

If the left input becomes empty the right input is unlink, thus clearing 
the right input's memory too. The moment the left input receives a 
propagation it re-attaches the right input fully populating it's memory. 
The node can then attempt joins as normal. Vice-versa if the right input 
becomes empty it unlinks the left input. The moment the right input 
receives a propagation it re-attaches the left input fully populating 
it's memory so that the node can attempt to join as normal.

Implementing Left and Right Unlinking for shared Knowledge Bases

The description of unlinking in the paper won't work for Drools or for 
other rule engines that share the knowledge base between multiple 
sessions. In Drools the session data is decoupled from the main 
knowledge base and multiple sessions can share the same knowledge base. 
The paper above describes systems where the session data is tightly 
coupled to the knowledge base and the knowledge base has only a single 
session. In shared systems a node input that is empty for one session 
might not be empty for another. Instead of physically unlinking the 
nodes, as described in the paper, an integer value can be used on the 
session's node memory that indicates if the node is unlinked for left, 
right or both inputs. When the propagating node attempts to propagate 
instead of just creating a left or right tuple and pushing it into the 
node. It'll first retrieve the node's memory and only create the tuple 
and propagate if it's linked.

This is great as it also avoids creating tuple objects that would just 
be discarded afterwards as there would be nothing to join with, making 
things lighter on the GC. However it means the engine looks up the node 
memory twice, once before propagating to the node and also inside of the 
node as it attempt to do joins. Instead the node memory should be looked 
up once, prior to propagating and then passed as an argument, avoiding 
the double lookup.

Traditional Rete has memory per alpha node, for each literal constraint, 
in the network. Drools does not have alpha memory, instead facts are 
pulled from the object type node. This means that facts may needlessly 
evaluate in the alpha part of the network, only to be refused addition 
to the node memory afterwards. Rete supports something called "node 
sharing", where multiple rules with similar constructs use the same 
nodes in the network. For this reason shared nodes cannot easily be 
unlinked. As a compromise when the alpha node is no longer shared, the 
network can do a node memory lookup, prior to doing the evaluation and 
check if that section of the network is unlinked and avoid attempting 
the evaluation if it is. This allows for left and right unlinking to be 
used in a engine such as Drools.

Using Left and Right Unlinking at the Same Time

The original paper describes an implantation in which a node cannot have 
both the left and right inputs unlinked for the same node. Building on 
the extension above to allow unlinking to work with a shared knowledge 
base the initial linking status value can be set to both left and right 
being unlinked. However in this initial state, where both sides are 
unlinked, the leaf node's right input isn't just waiting for a left 
propagation so the right can re-link itself (which it can't as the left 
is unlinked too). It's also waiting to receive it's first propagation, 
when it does it will link the left input back in. This will then tell 
it's parent node's right input to also do the same, i.e. wait for it's 
first right input propagation and link in the left when it happens. If 
it already has a right propagation it'll just link in the left anyway. 
This will trickle up until the root is finally linked in and 
propagations can happen as normally, and the rule's nodes return to the 
above heuristics for when to link and unlink the nodes.

Avoid Unnecessary Eager Propagations

A rule always eagerly propagates all joins, regardless of whether the 
child node can undertake joins too, for instance of there is no 
propagates for the leaf node then no rules can fire, and the eager 
propagations are wasted work. Unlinking can be extended to try to 
prevent some level of eager propagations. Should the leaf node become 
right unlinked and that right input also become empty it will unlink the 
left too (so both sides are unlinked) and go back to waiting for the 
first right propagation, at which point it'll re-link the left. If the 
parent node also has it's right input unlinked at the point that it's 
child node unlinks the left it will do this too. It will repeat this up 
the chain until it reaches a node that has both left and right linked 
in. This stops any further eager matching from occurring that we know 
can't result in an activation until the leaf node has at least one right 
input.

Heuristics to Avoid Churn from Excessive and Unnecessary Unlinking

The only case where left and right linking would be a bad idea is in 
situations that would cause a "churn". Churn is when a node with have a 
large amount of right input memory is continually caused to be linked in 
and linked out, forcing those nodes to be repeatedly populated which 
causes a slow down. However heuristics can be used here too, to avoid 
unnecessary unlinking. The first time an input becomes empty unlink the 
opposite and store a time stamp (integer counter for fact handles from 
the WM). Then have a minimum delta number, say 100. The next time it 
attempts to unlink, calculate the delta of the current time stamp 
(integer counter on fact handle) and the time stamp of the node which 
last unlinked (which was recorded at the point of unlinking) if it's 
less than 100 then do nothing and don't unlink until it's 100 or more. 
If it's 100 or more then unlink and as well as storing the unlink time 
stamp, then take the delta of 100 or more and apply a multiple (2, 3, 4 
etc depending on how steep you want it to rise, 3 is a good starting 
number) and store it. Such as if the delta is 100 then store 300. The 
next time the node links and attempts to unlink it must be a delta of 
300 or more, the time after that 900 the time after that 2700.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Left and Right Unlinking - Community Project Proposal