Hi!
We're running drools in a multi-server setup, with a pretty basic scenario:
1. When the first session is needed, we load the rulebase once. The rulebase is never
changed after this (no rule removal or addition).
2. Each call spawns a new StatefulKnowledgeSession which is used for firing the rules once
and then disposed. These calls are heavily multi-threaded, but each session is only used
in a single method call from a single thread.
3. We use the resource scanner to check for updated packages. When an updated knowledge
package is available, the knowledgebase is discarded and a new one constructed from the
updated package.
The knowledge package affected is somewhat large - about 25 MB in size. (There are other,
smaller knowledge packages which do not appear to have this issue.)
This setup has been working fine, but somewhat slow. After upgrading to 5.5.0.Final and
adding some speedups in the surrounding code we get a spurious null pointer. The trail of
events is:
1. After start all servers work fine.
2. Sometime after a couple of hours to a couple of days, the null pointer error occurs on
one or more servers (but generally not all of them).
3. Once the null pointer occurs, it happens on all further calls using that knowledgebase.
No other knowledgebases are effected. (You can load the same package in a new
knowledgebase, and sessions using the new, identical, knowledgebase work just fine.)
4. Restarting the java process makes the server work fine again.
Calls always fail with the same stack trace:
org.drools.RuntimeDroolsException: Unexpected exception executing action
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction@5db53f
24
at
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:995)
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:335)
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:311)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:903)
at org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:847)
at
org.drools.impl.StatefulKnowledgeSessionImpl.insert(StatefulKnowledgeSessionImpl.java:269)
<stacktrace from our calling code omitted>
Caused by: java.lang.NullPointerException
at org.drools.rule.EvalCondition.createContext(EvalCondition.java:107)
at org.drools.reteoo.EvalConditionNode.createMemory(EvalConditionNode.java:261)
at
org.drools.common.ConcurrentNodeMemories.createNodeMemory(ConcurrentNodeMemories.java:90)
at
org.drools.common.ConcurrentNodeMemories.getNodeMemory(ConcurrentNodeMemories.java:69)
at
org.drools.common.AbstractWorkingMemory.getNodeMemory(AbstractWorkingMemory.java:1034)
at
org.drools.reteoo.EvalConditionNode.assertLeftTuple(EvalConditionNode.java:174)
at
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:196)
at
org.drools.reteoo.SingleLeftTupleSinkAdapter.createAndPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:145)
at
org.drools.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:154)
at
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:59)
at org.drools.reteoo.ObjectTypeNode.assertObject(ObjectTypeNode.java:235)
at org.drools.reteoo.EntryPointNode.assertObject(EntryPointNode.java:240)
at org.drools.reteoo.Rete.assertObject(Rete.java:109)
at org.drools.reteoo.ReteooRuleBase.assertObject(ReteooRuleBase.java:286)
at
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction.execute(ReteooWorkingMemory.java:434)
at
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:993)
... 93 more
Unfortunately we don't have the remaining rows in the wrapped exception.
From this trace, it looks like the problem is in EvalCondition:
public Object createContext() {
return this.expression.createContext();
}
which should only be able to produce a null pointer if expression is missing. Looking at
the constructors for this class, having a null expression seems like a valid use case -
but if this is a valid use case, why doesn't createContext() handle this case?
Considering that running drools with the same package and the same inputs can either fail
with the above error (if knowledgebase has been used enough) or work just as intended
(using a newly minted knowledgebase), it feels like something is corrupting the
knowledgebase. Does this make sense?
(I noticed in the issue tracker (
https://issues.jboss.org/browse/DROOLS-156 ) that memory
for eval nodes are 'created "too lazily"' - might this be a related
issue?)
Best regards,
Anders