[rules-users] Workitems doesn't get persisted when completing a task after rehydrating a knowledge session is some circumstances.

Alberto R. Galdo argaldo at gmail.com
Fri Jun 22 06:34:42 EDT 2012


Hi,

   We have a fairly large BPMN process running inside a JPA persisted
StatefulKnowledgeSession using Drools 5.4 & JBPM 5.3. Our process involves
timers, automated tasks, human tasks .... most of them are long-running
processes, so a fault-tolerant scenario is a must.

    We've found what seems to be a weird, weird bug in JBPM-Drools
regarding the execution of BPMN processes. This is by best to summarize the
problem:

     "We are unable to complete a human task after rehydrating a Drools
knowledge session because in some circunstances the generated Drools'
workitems don't get persisted in the database after the completion of a
previous task"

    So, as the workitem is not in the database, when a human task client
completes a task that is related to that non-existent workitem, the process
doesn't get restarted. And the process fails.

    ¿Why does this happens? Lets see:

     When the processs is executed, different workitems get created,
updated and eventually deleted during the execution of a process up until a
human task is created ( in our process ). When living in a persistet
knowledge session, the transaction that is associated to Drools' thread is
commited right after the human task is created in the human task server ...
as it is a "safe point". Nothing here. Everithing is consistent, if you
look at the database you will see your session instance, your process
instance, and the final human task workitem as it is the only workitem
survivor after the execution ( whatever hadler-managed automated task that
were executed before the human task are deleted and the human task workitem
needs to survive as it's completion depends on asyncronous client
interaction ).

     Now, if you connect to the human task server and complete that human
task, a message is sent to the Drools session to update the state of the
work item. The workitem gets updated, the process get restarted and the
flow continues ... maybe generating a new human task ( which is our case ).
At this very moment, if you take a look at the database, there are no
automated-handled-task workitems ( as expected ) but there isn't any human
task related work item, even worse, the task at the human task server is
created, persisted and has a reference to the non-existant workitem.

    Days of debugging led us to what we think is the source of the problem:

    We found that the execution of the process after completing a task is
being executed in the same thread as the one that receives the mina message
that the human task server sends whenever a task is completed. This thread
is not the same thread that executes the knowledgesession ( where the
reteoo lives ) and so it doesn't have a transaction. By the way, we found
that for  workitem persistence the JPAWorkitemManager never joins an active
transaction. :(

    That's why invoking the persistence of a workitem as a consequence of
restarting the execution of a process inside the thread that receives the
mina messages makes the database inconsistent, and so invalidating all
means to make JBPM fault tolerant by making Drools session persistent.

    We found a way to circunvent this problem, making all our human task
nodes be followed by a event timer. That way, when the timer gets completed
we force the execution of the process to live in the same thread that the
reteoo session lives where a transaction is available and things get back
to normal. But this is really dirty and wrong.

    Any thoughts?

    We are really eager to be wrong whith this. :'(

Greets,



Alberto R. Galdo
argaldo at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20120622/eef3348e/attachment.html 


More information about the rules-users mailing list