Hi!
We're running drools in a multi-server setup, with a pretty basic scenario:
1. When the first session is needed, we load the rulebase once. The rulebase is never changed after this (no rule removal or addition).
2. Each call spawns a new StatefulKnowledgeSession which is used for firing the rules once and then disposed. These calls are heavily multi-threaded, but each session is only used
in a single method call from a single thread.
3. We use the resource scanner to check for updated packages. When an updated knowledge package is available, the knowledgebase is discarded and a new one constructed from the updated
package.
The knowledge package affected is somewhat large - about 25 MB in size. (There are other, smaller knowledge packages which do not appear to have this issue.)
This setup has been working fine, but somewhat slow. After upgrading to 5.5.0.Final and adding some speedups in the surrounding code we get a spurious null pointer. The trail of events
is:
1. After start all servers work fine.
2. Sometime after a couple of hours to a couple of days, the null pointer error occurs on one or more servers (but generally not all of them).
3. Once the null pointer occurs, it happens on all further calls using that knowledgebase. No other knowledgebases are effected. (You can load the same package in a new knowledgebase,
and sessions using the new, identical, knowledgebase work just fine.)
4. Restarting the java process makes the server work fine again.
Calls always fail with the same stack trace:
org.drools.RuntimeDroolsException: Unexpected exception executing action org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction@5db53f
24
at org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:995)
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:335)
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:311)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:903)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:847)
at org.drools.impl.StatefulKnowledgeSessionImpl.insert(StatefulKnowledgeSessionImpl.java:269)
<stacktrace from our calling code omitted>
Caused by: java.lang.NullPointerException
at org.drools.rule.EvalCondition.createContext(EvalCondition.java:107)
at org.drools.reteoo.EvalConditionNode.createMemory(EvalConditionNode.java:261)
at org.drools.common.ConcurrentNodeMemories.createNodeMemory(ConcurrentNodeMemories.java:90)
at org.drools.common.ConcurrentNodeMemories.getNodeMemory(ConcurrentNodeMemories.java:69)
at org.drools.common.AbstractWorkingMemory.getNodeMemory(AbstractWorkingMemory.java:1034)
at org.drools.reteoo.EvalConditionNode.assertLeftTuple(EvalConditionNode.java:174)
at org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:196)
at org.drools.reteoo.SingleLeftTupleSinkAdapter.createAndPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:145)
at org.drools.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:154)
at org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:59)
at org.drools.reteoo.ObjectTypeNode.assertObject(ObjectTypeNode.java:235)
at org.drools.reteoo.EntryPointNode.assertObject(EntryPointNode.java:240)
at org.drools.reteoo.Rete.assertObject(Rete.java:109)
at org.drools.reteoo.ReteooRuleBase.assertObject(ReteooRuleBase.java:286)
at org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction.execute(ReteooWorkingMemory.java:434)
at org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:993)
... 93 more
Unfortunately we don't have the remaining rows in the wrapped exception.
From this trace, it looks like the problem is in EvalCondition:
public Object createContext() {
return this.expression.createContext();
}
which should only be able to produce a null pointer if expression is missing. Looking at the constructors for this class, having a null expression seems like a valid use case - but
if this is a valid use case, why doesn't createContext() handle this case?
Considering that running drools with the same package and the same inputs can either fail with the above error (if knowledgebase has been used enough) or work just as intended (using
a newly minted knowledgebase), it feels like something is corrupting the knowledgebase. Does this make sense?
(I noticed in the issue tracker ( https://issues.jboss.org/browse/DROOLS-156 ) that memory for eval nodes are 'created "too lazily"' - might this be a related issue?)
Best regards,
Anders