[esb-issues] [JBoss JIRA] Created: (JBESB-2204) jBPM Persistence race condition in EsbActionHandler under high load

Damon Brown (JIRA) jira-events at lists.jboss.org
Fri Nov 21 21:02:36 EST 2008


jBPM Persistence race condition in EsbActionHandler under high load
-------------------------------------------------------------------

                 Key: JBESB-2204
                 URL: https://jira.jboss.org/jira/browse/JBESB-2204
             Project: JBoss ESB
          Issue Type: Bug
      Security Level: Public (Everyone can see)
          Components: Process flow
    Affects Versions: 4.4
         Environment: AS 4.2.3.GA, jBPM 3.2.3.GA + ehcache, RHEL 5, x86_64, 1.6.0 Sun JVM, Oracle 10g
            Reporter: Damon Brown
            Priority: Minor
         Attachments: CallbackCommand.java.patch

Under extremely high load, the org.jboss.soa.esb.services.jbpm.cmd.CallbackCommand:90 cannot always locate the Token to signal for the target process.  The exception is not propagated to the CallbackQueue and the message is not retried.  The target process is left in a suspended state even though the ESB service has completed its action.  The failure rate is machine-load dependent (1 - 5% failure rate).

The root cause of the missing token is not able to be attributed to the jBPM hibernate session directly.  I extended org.jboss.soa.esb.services.jbpm.actionhandlers.EsbActionHandler directly to forcibly flush the current state of the token to the database.  Within the same action, I queried for the state of the object, which was returned successfully.

There appears to be a race condition from ESB service invocation (EsbActionHandler), token persistence (hibernate flush), and command callback (CallbackCommand).  The particular ESB service I am invoking has a short suspense.  Other ESB services, which take longer to execute, do not exhibit this behavior.  The behavior for the short-suspense service was random, but repeatable.

The patch file attached to this issue attempts to resolve the issue by re-trying the token look-up three times with an intermittent sleep.  Within my environment, under high load conditions, the token failed to resolve on the first try 1% - 5% of the time (98% average success on first try), with a 100% success rate on the second try.  The target process instance is successfully signaled.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the esb-issues mailing list