Hello everyone,
I am encountering weird issues with the DefaultWorkItemManager. It seems to
somehow "loose" WorkItems occasionally. I haven't completely figured out
what is happening exactly which also means I can't provide a testcase that
reliably reproduces my error.
Let me give you a short description of my usage scenario: I'm using Drools
5.1.1 on a Win7-x64 PC with a JDK1.6.0_22-x64 VM. I'm trying to run lots of
process instances at the same time, lets say 100 for now. These are either
started right after each other (for-loop calling startProcess() each time,
always in a new thread) or with a random delay of up to 2000ms. The ksession
is running in a separate thread with fireUntilHalt().
In the processes I use, I try to implement some rule-based runtime process
modifications. The idea is that the process stops at a certain node, inserts
a special event containing process instance information into the
WorkingMemory, the rule engine evaluates the event and decides what
modifications to execute. Once those modifications are finished, the process
instance is notified and continues.
To realise this functionality, I created a class that implements
ProcessEventListener, AgendaEventListener and WorkItemHandler interfaces at
the same time. An instance of that class is registered for those three types
at startup of my Drools application.
I use the beforeNodeTriggered() event to create my own special event, in
executeWorkItem() I save the WorkItem and corresponding WorkItemManager and
insert my special event into WorkingMemory, triggering the drools rule
engine to evaluate it and execute modifications accordingly.
I use afterActivationFired() to get notified about the rule engine finishing
the modifications (I listen for a certain rule being fired, because that
rule is always fired last, no matter if there were modifications or not). In
afterActivationFired() I complete the WorkItem I saved before, thus letting
the process instance continue.
And that's the point where something goes wrong from time to time. Every now
and then, the call of completeWorkItem() has no effect. I investigated using
the debugger and the WorkItem in question is not registered with the
WorkItemHandler right before the call of completeWorkItem(), so that this
call can't do anything. This causes the process instance that created the
"lost" WorkItem to hang at the WorkItemNode with no way to have it continue
execution. The "lost" WorkItem has status 0 even after calling
completeWorkItem().
I tried tracing the path of the WorkItem, but so far, all I can say is that
WorkItems are always registered with the WorkItemManager at
executeWorkItem(), but at afterActivationFired() sometimes they are not. So
it has to happen between those two points in the code. However, I could not
find any way to remove a WorkItem from the WorkItemManager without changing
its status, so I have no idea what could cause this.
Initially I suspected concurrency issues because I had some
ConcurrentModificationExceptions as well. But since then I synchronized all
crucial parts (i.e. all parts using WorkItems or changing the process in any
way) on the RuleFlowProcess object and on the ProcessInctance object. This
fixed the CMEs but the "lost" WorkItems still occur from time to time.
I've really run out of ideas by now. Since this bug only occurs seldomly, it
is really hard to reconstruct what's going on. I've tried to strip away as
much of my coding as possible to see if this also happens in standard
drools, but I couldn't reproduce it so far. I guess it may have something to
do with the amount of work I perform in the rule engine for my modifications
because "dummy" WorkItemHandlers that just complete a WorkItem right away
don't seem to cause errors like the one I'm encountering.
If anyone has any idea what could cause this, any help would be hugely
appreciated. Also, feel free to ask questions if you need further
information. However, I can't really share my code publicly for various
reasons, but I will try to extract meaningful parts if necessary.
Regards
Henning Losert
--
View this message in context:
http://drools-java-rules-engine.46999.n3.nabble.com/Weird-Problem-WorkIte...
Sent from the Drools - User mailing list archive at
Nabble.com.