[jboss-jira] [JBoss JIRA] Commented: (JBPM-983) concurrent process execution fails

Tue Jul 3 09:21:58 EDT 2007

    [ http://jira.jboss.com/jira/browse/JBPM-983?page=comments#action_12367534 ] 

Edward Staub commented on JBPM-983:
-----------------------------------

Tom,

I tested with default database and isolation, in order to avoid any confusion about db-related issues.

>> * StaleObjectStateException are to be expected and don't mean it's not working 

As I explained in JBPM-995, Hibernate logs this Exception itself at Error level.  AFAIK, there is no way in JBPM to keep it out of the log.  To my mind, emitting spurious error messages into the log is a bug - it means that anyone who doesn't know which errors to ignore is going to waste a lot of time chasing phantoms.  In any event, that particular problem has an easy fix, as described in the report.

But that's all irrelevant.  The problems in this JIRA report relevant to StaleObjectStateException are not at the job level and are not handled in any way.  All of the stale objects I've seen have been tokens, due to the problems described below (and earlier in this thread).

>> I would need help in setting up the right environment to reproduce this: what database ? what isolation level you use for the driver ?

JBPM defaults:  HQSLDB, isolation level 0.  FWIW, we'll be using Oracle in production.

The three problems I've nailed down are:

-- if a job is saved while other jobs are in process for the same Process Instance, the saved job may run immediately, ignoring the "exclusive" flag

-- when a ProcessState is completed on a job that started in the subprocess, a collision can occur with another thread concurrently running in the parent process, resulting in a StaleObjectStateException

-- when a ProcessState launches a subprocess, and the subprocess immediately reaches a node with "async=exclusive", the node job may be picked up and started before the ProcessInstance object has been committed to the database. This has been noted before on the forum, but I can't find a JIRA item regarding it.

The following may also be a problem, but I found it before the others, and I think it may be a second-order effect:

-- two tokens can collide in the parent token at a join, resulting in a Hibernate StaleObjectStateException. I'm not sure how this would happen if the two tokens are "exclusive", so it may only occur because of the following two problems...

AFAIK, none of these are prevented in any way by the current JobExecutorThread design, either as you've described or as I've read.  With one exception, the concurrency issues are due to collisions at the ProcessInstance level or Token level.  The one exception is when a job is both queued and started while another job for the same ProcessInstance is already running.

All of these were reproduced using Alex's test case.  As I noted above, in order to ensure reproducibility, it's important to widen the critical sections, in my case by turning up logging and logging to a console.  

I found that sometimes I'd be trying to chase one of these bugs and suddenly one of the other ones would start happening first, preventing me from following the first one.  I learned to be flexible; I just tackled whichever one showed its ugly head first.

There are still more bugs in here, but I don't have any more time right now to find them.  

Let me know if more info is useful.  See my first 28-June comment above for some design thoughts, and the 27-June note for issues I can see with the designs.

To fix these problems, we need a way to synchronize across threads and servers on ProcessInstances and Tokens .  
You may want to factor out a "Concurrency Service" or similar to deal with each of these three configurations in the most efficient manner:

 - single-threaded
 - multi-threaded (jobs/JMS/messaging; in-memory synchronization)
 - clustered (jobs/JMS/messaging; in-database synchronization)

In the clustered case, in most cases it would be desirable to implement server affinity for ProcessInstances - that way all token synchronization could be in-memory, because all tokens are known to be in the same server.  Of course, if the server goes away, it should give up all its ProcessInstances automatically.  Database row-locking might be a way to implement this.

-Ed Staub

> concurrent process execution fails
> ----------------------------------
>
>                 Key: JBPM-983
>                 URL: http://jira.jboss.com/jira/browse/JBPM-983
>             Project: JBoss jBPM
>          Issue Type: Bug
>          Components: Core Engine
>    Affects Versions: jBPM jPDL 3.2
>         Environment: Hypersonic in-memory database, JobExecutor configured with 5 threads
>            Reporter: Alexander Schlett
>         Assigned To: Tom Baeyens
>            Priority: Critical
>             Fix For: jBPM 3.3
>
>         Attachments: SimpleTest.java, SimpleTest.java, SimpleTest.java, SimpleTest.java, SimpleTest.java
>
>
> concurrent execution of async nodes with multiple JobExecutor threads fails. the effect is:
> 1) job sync within JobExecutor fails due to org.hibernate.StaleObjectStateException
> 2) process gets stuck in join node and never ends
> junit test for this is attached, it's a simple process with just a fork and a join and some scripts inbetween.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira