Ok, so I reviewed my code again and found a spot where I am queuing up tasks to process data that hadn't been loaded. My application shouldn't spawn a rule sessions in the case. I have also moved to a sever with less RAM and will begin profiling as soon as possible. Here is where I stand as of now:
1) I am still seeing about 250 million JoinNodeLeftTuples (eats up about 30GB of RAM). This greatly outnumbers anything else in the system by a factor of 100x.
2) Creating a test case will be very difficult as I cannot pinpoint the root cause of this issue. I am not sure if it is related to too many rules engine, the size of my ruleset, or simply the scale at which I am attempting to use Drools. If I pinpoint an actual memory leak, I will create a test case and submit.
3) I cannot use a StatelessKnowledgeSession as it locks my process up. It seems the code in StatelessKnowledgeSessionImpl's constructor creates a lock that never releases. No idea on this either for now.
this.ruleBase.lock();
try {
if ( ruleBase.getConfiguration().isSequential() ) {
this.ruleBase.getReteooBuilder().order();
}
} finally {
this.ruleBase.unlock(); <--- This line is never reached for some reason
}
I will continue trying to find a root cause or path forward and report back. If anything here gives you all a sense of some wrongdoing on my behalf, please let me know. I don't fully understand sequential mode as the documentation is not approachable from my uninformed perspective. So perhaps there is something I am doing in my rules that leads to this lock not getting release?
Thanks again,
Julian