If you have to persist the WorkingMemory and keep facts around from run to
run, you may want to customize your serialization. At the very least,
define transient values where you can, for fields that can be restored at
runtime from other knowledge. Once you're talking millions of objects,
every little bit helps. I'll let the JBoss guys comment on specifics of
whether customizing writeObject/readObject on your objects is a good idea
(I've had issues in the past with a persistant WorkingMemory, although those
seemed to be bugs that have since then been fixed in the trunk code).
In this sort of case, I'd try first to deconstruct the problem, however. Do
you really need 1,000,000 facts *at the same time* in the engine? Are there
possible matches from every fact object, to every other fact object, that
can only be resolved by having them all resident? One technique is to
instead inject smaller numbers of facts, and use helper objects to
accumulate some larger state.
For instance, I have many rules that try to detect patterns in (wait for it)
20-100 million pieces of IP network data per day. Obviously, I can't just
inject all those events and hope something comes out. Instead, I linearly
order the events (which I actually get in chunks throughout a given day),
and inject them into the engine one event at a time (I could do the entire
chunk, it just makes some of my rules easier to do it knowing only one Event
object exists at a time). I fire all the rules, then I retract the event.
Accumulators and other "tracking" objects note data of interest in the
event, until they possibly reach a terminal state, indicating a conclusion
has been reached. I then "harvest" the data from the helper object, and
retract *that*.
By doing this, I really only have a very small number of things in working
memory at a time. I also managed to segregate my working memory and my
engines -- I run 1 engine per device of interest, and any higher level
correlations between devices get captured as meta-facts that I feed into
*another* network of engines (to correlate global data).
Just some ideas!
--- Michael