Hello All,
We are planning to use a rules engine (specifically JBoss rules) for our
application which can potentially have upto 1,000,000 facts to crunch
through, which can take up a lot of memory. I am interested in any
techniques that could be used to reduce memory footprint while holding
large number of facts in working memory. Also, our requirements dictate
that we maintain a stateful working memory.
One thing hat comes to my mind is to demand-fetch field values for java
objects at get*() methods on fact objects are invoked, but if rules are
touching most of the fields of the facts, then memory would be filled up
sooner or later anyway.
I know the question is not directly related to JBoss rules, so I
apolagize if this is not the appropriate forum to pose this question...
Thanks
Jeevan
Show replies by date
If you have to persist the WorkingMemory and keep facts around from run to
run, you may want to customize your serialization. At the very least,
define transient values where you can, for fields that can be restored at
runtime from other knowledge. Once you're talking millions of objects,
every little bit helps. I'll let the JBoss guys comment on specifics of
whether customizing writeObject/readObject on your objects is a good idea
(I've had issues in the past with a persistant WorkingMemory, although those
seemed to be bugs that have since then been fixed in the trunk code).
In this sort of case, I'd try first to deconstruct the problem, however. Do
you really need 1,000,000 facts *at the same time* in the engine? Are there
possible matches from every fact object, to every other fact object, that
can only be resolved by having them all resident? One technique is to
instead inject smaller numbers of facts, and use helper objects to
accumulate some larger state.
For instance, I have many rules that try to detect patterns in (wait for it)
20-100 million pieces of IP network data per day. Obviously, I can't just
inject all those events and hope something comes out. Instead, I linearly
order the events (which I actually get in chunks throughout a given day),
and inject them into the engine one event at a time (I could do the entire
chunk, it just makes some of my rules easier to do it knowing only one Event
object exists at a time). I fire all the rules, then I retract the event.
Accumulators and other "tracking" objects note data of interest in the
event, until they possibly reach a terminal state, indicating a conclusion
has been reached. I then "harvest" the data from the helper object, and
retract *that*.
By doing this, I really only have a very small number of things in working
memory at a time. I also managed to segregate my working memory and my
engines -- I run 1 engine per device of interest, and any higher level
correlations between devices get captured as meta-facts that I feed into
*another* network of engines (to correlate global data).
Just some ideas!
--- Michael