Hi!
We're running drools in a multi-server
setup, with a pretty basic scenario:
1. When the first session is needed, we
load the rulebase once. The rulebase is never changed after
this (no rule removal or addition).
2. Each call spawns a new
StatefulKnowledgeSession which is used for firing the rules
once and then disposed. These calls are heavily
multi-threaded, but each session is only used in a single
method call from a single thread.
3. We use the resource scanner to check for
updated packages. When an updated knowledge package is
available, the knowledgebase is discarded and a new one
constructed from the updated package.
The knowledge package affected is somewhat
large - about 25 MB in size. (There are other, smaller
knowledge packages which do not appear to have this issue.)
This setup has been working fine, but
somewhat slow. After upgrading to 5.5.0.Final and adding some
speedups in the surrounding code we get a spurious null
pointer. The trail of events is:
1. After start all servers work fine.
2. Sometime after a couple of hours to a
couple of days, the null pointer error occurs on one or more
servers (but generally not all of them).
3. Once the null pointer occurs, it happens
on all further calls using that knowledgebase. No other
knowledgebases are effected. (You can load the same package in
a new knowledgebase, and sessions using the new, identical,
knowledgebase work just fine.)
4. Restarting the java process makes the
server work fine again.
Calls always fail with the same stack
trace:
org.drools.RuntimeDroolsException:
Unexpected exception executing action
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction@5db53f
24
at
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:995)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:335)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:311)
at
org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:903)
at
org.drools.common.AbstractWorkingMemory.insert(AbstractWorkingMemory.java:847)
at
org.drools.impl.StatefulKnowledgeSessionImpl.insert(StatefulKnowledgeSessionImpl.java:269)
<stacktrace from our calling code
omitted>
Caused by: java.lang.NullPointerException
at
org.drools.rule.EvalCondition.createContext(EvalCondition.java:107)
at
org.drools.reteoo.EvalConditionNode.createMemory(EvalConditionNode.java:261)
at
org.drools.common.ConcurrentNodeMemories.createNodeMemory(ConcurrentNodeMemories.java:90)
at
org.drools.common.ConcurrentNodeMemories.getNodeMemory(ConcurrentNodeMemories.java:69)
at
org.drools.common.AbstractWorkingMemory.getNodeMemory(AbstractWorkingMemory.java:1034)
at
org.drools.reteoo.EvalConditionNode.assertLeftTuple(EvalConditionNode.java:174)
at
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:196)
at
org.drools.reteoo.SingleLeftTupleSinkAdapter.createAndPropagateAssertLeftTuple(SingleLeftTupleSinkAdapter.java:145)
at
org.drools.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:154)
at
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:59)
at
org.drools.reteoo.ObjectTypeNode.assertObject(ObjectTypeNode.java:235)
at
org.drools.reteoo.EntryPointNode.assertObject(EntryPointNode.java:240)
at
org.drools.reteoo.Rete.assertObject(Rete.java:109)
at
org.drools.reteoo.ReteooRuleBase.assertObject(ReteooRuleBase.java:286)
at
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteAssertAction.execute(ReteooWorkingMemory.java:434)
at
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:993)
... 93 more
Unfortunately we don't have the remaining
rows in the wrapped exception.
From this trace, it looks like the problem
is in EvalCondition:
public Object createContext() {
return
this.expression.createContext();
}
which should only be able to produce a null
pointer if expression is missing. Looking at the constructors
for this class, having a null expression seems like a valid
use case - but if this is a valid use case, why doesn't
createContext() handle this case?
Considering that running drools with the
same package and the same inputs can either fail with the
above error (if knowledgebase has been used enough) or work
just as intended (using a newly minted knowledgebase), it
feels like something is corrupting the knowledgebase. Does
this make sense?
Best regards,
Anders