If you have to persist the WorkingMemory and keep facts around from run to run, you may want to customize your serialization. At the very least, define transient values where you can, for fields that can be restored at runtime from other knowledge. Once you're talking millions of objects, every little bit helps. I'll let the JBoss guys comment on specifics of whether customizing writeObject/readObject on your objects is a good idea (I've had issues in the past with a persistant WorkingMemory, although those seemed to be bugs that have since then been fixed in the trunk code).
In this sort of case, I'd try first to deconstruct the problem, however. Do you really need 1,000,000 facts *at the same time* in the engine? Are there possible matches from every fact object, to every other fact object, that can only be resolved by having them all resident? One technique is to instead inject smaller numbers of facts, and use helper objects to accumulate some larger state.
For instance, I have many rules that try to detect patterns in (wait for it) 20-100 million pieces of IP network data per day. Obviously, I can't just inject all those events and hope something comes out. Instead, I linearly order the events (which I actually get in chunks throughout a given day), and inject them into the engine one event at a time (I could do the entire chunk, it just makes some of my rules easier to do it knowing only one Event object exists at a time). I fire all the rules, then I retract the event. Accumulators and other "tracking" objects note data of interest in the event, until they possibly reach a terminal state, indicating a conclusion has been reached. I then "harvest" the data from the helper object, and retract *that*.
By doing this, I really only have a very small number of things in working memory at a time. I also managed to segregate my working memory and my engines -- I run 1 engine per device of interest, and any higher level correlations between devices get captured as meta-facts that I feed into *another* network of engines (to correlate global data).
Just some ideas!
--- Michael